Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel crash with Tesla P40 GPU on CUDA 12.1, but works fine on Google Colab with CUDA 12.0 #378

Closed
mabergerx opened this issue Jul 25, 2023 · 7 comments

Comments

@mabergerx
Copy link

I'm experiencing a kernel crash when running the faster-whisper model on a Tesla P40 GPU in my offline environment, while the same package/model works perfectly fine on Google Colab equipped with a Tesla T4 GPU.

Environment Details:

Offline Environment:
GPU: Tesla P40
NVIDIA-SMI: 525.105.17
Driver Version: 525.105.17
CUDA Version: 12.1

Google Colab:
GPU: Tesla T4
NVIDIA-SMI: 525.105.17
Driver Version: 525.105.17
CUDA Version: 12.0

Observations:

Multiple GitHub issues here suggested the package is optimized for CUDA 11 and not CUDA 12. However, since it works in Google Colab with CUDA 12.0, I'm curious why my offline setup with CUDA 12.1 crashes. I also don't really see anything in the logs of my Jupyter process, the kernel just dies, and seeing as I am trying to transcribe a small audio file, and looking at the output of nvidia-smi, I can't imagine this is an OOM error.

The model works perfectly on my offline machine when running on a CPU.

Standard whisper models and other HuggingFace models operate smoothly on my GPU.

I receive the following warning when executing the model:

[2023-07-25 08:50:17.414] [ctranslate2] [thread 482] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

When I explicitly specify float16, the model crashes citing the aforementioned reason.

Dependencies

I see some interesting things in my pip freeze within my env:

nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
ctranslate2==3.17.1

These packages seem to refer to CUDA 11 stuff... I don't know if that could be an issue.

I'd appreciate insights into why the kernel crashes in my offline setup, even though other environment (Google Colab) with also CUDA 12 don't experience this issue. It seems like there might be nuances with CUDA 12.1 or maybe some environment configuration of my machine.

Thanks in advance!

@guillaumekln
Copy link
Contributor

faster-whisper requires CUDA 11. We don't expect it to work with CUDA 12.

Since you have some CUDA 11 packages installed with pip, you could make use of them using this technique: #153 (comment)

@mabergerx
Copy link
Author

Thanks for the comment.

From the issue you sent, when I perform os.path.dirname(nvidia.cublas.lib.__file__), I get nothing. Does it mean cuBLAS is not actually present on the machine? This seems weird as I thought that it is installed automatically when CUDA is installed.

Moreover, I am still a bit confused why the package does work on a Google Colab with CUDA 12.0 and same driver? The only thing I can pinpoint for now is the cuBLAS issue, as it is probably present on the Colab machine. Doing

! cat /usr/local/cuda/include/cublas.h | grep CUBLAS

on the Colab returns output, while the /usr/local/cuda/include on my machine doesn't have anything cublas related.

I know this question kinda derailed from the faster-whisper package problem, but maybe it is relevant for others facing similar issue.

@guillaumekln
Copy link
Contributor

guillaumekln commented Jul 25, 2023

Moreover, I am still a bit confused why the package does work on a Google Colab with CUDA 12.0 and same driver?

Google Colab uses CUDA 11.8.

The "CUDA Version: 12.0" that you see in nvidia-smi corresponds to the CUDA version associated with the GPU driver, but it does not mean that this CUDA version is installed on the system.

From the issue you sent, when I perform os.path.dirname(nvidia.cublas.lib.file), I get nothing. Does it mean cuBLAS is not actually present on the machine? This seems weird as I thought that it is installed automatically when CUDA is installed.

The CUDA libraries installed with pip and the CUDA libraries installed on the system are 2 different things.

If your current Python environment contains the CUDA libraries as listed above then the technique shown in #153 (comment) should work.

If for some reasons you can't use these libraries installed with pip, then you should install CUDA 11 and cuDNN for CUDA 11 on the system, either following the installation instructions from NVIDIA or using a Docker image.

@EricKong1985
Copy link

File "D:\Python310\lib\site-packages\faster_whisper\transcribe.py", line 573, in encode
return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: Library cublas64_11.dll is not found or cannot be loaded

@EricKong1985
Copy link

Looks it also not support well in windows cuda12

@orderer0001
Copy link

Can it be update to be compatible with cuda 12?

@guillaumekln
Copy link
Contributor

Updating to CUDA 12 is not planned in the very short term. See an explanation here: #47 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants