Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Between "No GPU/TPU found, falling back to CPU." and "failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error" #157

Open
sbisw002 opened this issue Mar 19, 2024 · 0 comments

Comments

@sbisw002
Copy link

I am trying to get my new ThinkPad with "NVIDIA RTX 4000 Ada 12 GB" graphics card going.

No matter what "cuda-driver(12.4)+cudnn+jax+jaxlib" combination I try, the best results are either a)"No GPU/TPU found, falling back to CPU." or b)"failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error"

When I run Data Sampler section from https://github.com/PredictiveIntelligenceLab/ImprovedDeepONets/blob/main/Stokes/PI_DeepONet_Stokes.ipynb

I get errors like:

a)
Installation:
pip install jaxlib==0.4.7+cuda12.cudnn88 jax==0.4.7 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Run:
runfile('/home/saumya/NeuralN/Op Net/ImprovedDeepONets/Stokes/PI_DeepONet_Stokes-Copy1', wdir='/home/saumya/NeuralN/Op Net/ImprovedDeepONets/Stokes')
2024-03-19 11:48:27.682846: I external/xla/xla/service/service.cc:168] XLA service 0x8dd95c0 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2024-03-19 11:48:27.682867: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Interpreter,
2024-03-19 11:48:27.689135: I external/xla/xla/pjrt/tfrt_cpu_pjrt_client.cc:218] TfrtCpuClient created.
2024-03-19 11:48:29.450971: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-03-19 11:48:29.450988: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: saumya-TP-GPU
2024-03-19 11:48:29.450991: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: saumya-TP-GPU
2024-03-19 11:48:29.451052: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 550.54.14
2024-03-19 11:48:29.451064: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: NOT_FOUND: could not find kernel module information in driver version file contents: "NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 550.54.14 Release Build (dvs-builder@U16-A24-2-2) Thu Feb 22 01:44:50 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
"
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

b)
Installation:
pip install jaxlib==0.4.9+cuda12.cudnn88 jax==0.4.9 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Run:
2024-03-19 12:10:31.130411: I external/xla/xla/service/service.cc:168] XLA service 0x6a1d490 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2024-03-19 12:10:31.130427: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Interpreter,
2024-03-19 12:10:31.134477: I external/xla/xla/pjrt/tfrt_cpu_pjrt_client.cc:433] TfrtCpuClient created.
2024-03-19 12:10:50.428065: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-03-19 12:10:50.428083: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: saumya-TP-GPU
2024-03-19 12:10:50.428086: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: saumya-TP-GPU
2024-03-19 12:10:50.428143: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 550.54.14
2024-03-19 12:10:50.428156: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: NOT_FOUND: could not find kernel module information in driver version file contents: "NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 550.54.14 Release Build (dvs-builder@U16-A24-2-2) Thu Feb 22 01:44:50 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
"
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

My system:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

$ nvidia-smi
Tue Mar 19 12:21:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 ERR! Off | 00000000:01:00.0 N/A | N/A |
|ERR! ERR! ERR! N/A / N/A | 14MiB / 12282MiB | N/A Default |
| | | ERR! |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

python version
$ whereis python | tr ' ' '\n' | grep ^/ | sort
/home/saumya/anaconda3/envs/OpNet/bin/python
$ python --version && python3 --version
Python 3.9.18
Python 3.9.18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant