-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation violation during checkpoint restore (cpu mode) #49
Comments
Cricket currently only supports C/R when you only use the runtime API. It looks like your checkpoint contains a call to a driver API function for which there is currently no C/R support. |
Hi! By runtime API do you mean the ./gpu part of the Cricket? Yes, I was able to generate a checkpoint for one of your samples (probably, it was test_apps/matmul.cu). Do you have any plans to add a support for C/R for a "cpu" mode ? Best, |
Hey, I mean the CUDA Runtime API (see https://docs.nvidia.com/cuda/cuda-runtime-api/index.html). |
Hi! AFAIU, invoking of |
Have you linked to the CUDA libaries dynamically, i.e., using |
I also encounter this bug when compiling my simple CUDA application with shared cudart library. Seems like $ nm matrixMult.bin | grep cuda
0000000000001be4 t _Z16cudaLaunchKernelIcE9cudaErrorPKT_4dim3S4_PPvmP11CUstream_st
0000000000005048 b _ZL20__cudaFatCubinHandle
0000000000005070 b _ZL20__cudaFatCubinHandle
0000000000005050 b _ZL22__cudaPrelinkedFatbins
0000000000001b86 t _ZL24__sti____cudaRegisterAllv
000000000000191d t _ZL26__cudaUnregisterBinaryUtilv
0000000000001b20 t _ZL31__nv_cudaEntityRegisterCallbackPPv
0000000000005080 b _ZZL31__nv_cudaEntityRegisterCallbackPPvE5__ref
U [email protected]
U [email protected]
U [email protected]
U [email protected]
U [email protected]
U [email protected]
0000000000001329 t __cudaUnregisterBinaryUtil
U [email protected]
U [email protected]
U [email protected]
U [email protected]
U [email protected] |
Hi!
I want to try cricket for C/R in cpu mode (no in-kernel checkpointing). However, when I run restore it fails with segfault.
After a little debugging, I have found out that the problem comes from using
rpc_register_function_1_svc
in restore process (see gdb trace). In the comments it is said that it does not support checkpoint/restore. But I have not found how to avoid it, because it is called from the__cudaRegisterFunction
at the client side.Does it mean that C/R does not work in Cricket for cpu at the moment? Thank you!
The text was updated successfully, but these errors were encountered: