Kernel crash with Tesla P40 GPU on CUDA 12.1, but works fine on Google Colab with CUDA 12.0 #378

mabergerx · 2023-07-25T10:05:09Z

I'm experiencing a kernel crash when running the faster-whisper model on a Tesla P40 GPU in my offline environment, while the same package/model works perfectly fine on Google Colab equipped with a Tesla T4 GPU.

Environment Details:

Offline Environment:
GPU: Tesla P40
NVIDIA-SMI: 525.105.17
Driver Version: 525.105.17
CUDA Version: 12.1

Google Colab:
GPU: Tesla T4
NVIDIA-SMI: 525.105.17
Driver Version: 525.105.17
CUDA Version: 12.0

Observations:

Multiple GitHub issues here suggested the package is optimized for CUDA 11 and not CUDA 12. However, since it works in Google Colab with CUDA 12.0, I'm curious why my offline setup with CUDA 12.1 crashes. I also don't really see anything in the logs of my Jupyter process, the kernel just dies, and seeing as I am trying to transcribe a small audio file, and looking at the output of nvidia-smi, I can't imagine this is an OOM error.

The model works perfectly on my offline machine when running on a CPU.

Standard whisper models and other HuggingFace models operate smoothly on my GPU.

I receive the following warning when executing the model:

[2023-07-25 08:50:17.414] [ctranslate2] [thread 482] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

When I explicitly specify float16, the model crashes citing the aforementioned reason.

Dependencies

I see some interesting things in my pip freeze within my env:

nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
ctranslate2==3.17.1

These packages seem to refer to CUDA 11 stuff... I don't know if that could be an issue.

I'd appreciate insights into why the kernel crashes in my offline setup, even though other environment (Google Colab) with also CUDA 12 don't experience this issue. It seems like there might be nuances with CUDA 12.1 or maybe some environment configuration of my machine.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

guillaumekln · 2023-07-25T10:08:51Z

faster-whisper requires CUDA 11. We don't expect it to work with CUDA 12.

Since you have some CUDA 11 packages installed with pip, you could make use of them using this technique: #153 (comment)

mabergerx · 2023-07-25T12:15:40Z

Thanks for the comment.

From the issue you sent, when I perform os.path.dirname(nvidia.cublas.lib.__file__), I get nothing. Does it mean cuBLAS is not actually present on the machine? This seems weird as I thought that it is installed automatically when CUDA is installed.

Moreover, I am still a bit confused why the package does work on a Google Colab with CUDA 12.0 and same driver? The only thing I can pinpoint for now is the cuBLAS issue, as it is probably present on the Colab machine. Doing

! cat /usr/local/cuda/include/cublas.h | grep CUBLAS

on the Colab returns output, while the /usr/local/cuda/include on my machine doesn't have anything cublas related.

I know this question kinda derailed from the faster-whisper package problem, but maybe it is relevant for others facing similar issue.

guillaumekln · 2023-07-25T12:37:10Z

Moreover, I am still a bit confused why the package does work on a Google Colab with CUDA 12.0 and same driver?

Google Colab uses CUDA 11.8.

The "CUDA Version: 12.0" that you see in nvidia-smi corresponds to the CUDA version associated with the GPU driver, but it does not mean that this CUDA version is installed on the system.

From the issue you sent, when I perform os.path.dirname(nvidia.cublas.lib.file), I get nothing. Does it mean cuBLAS is not actually present on the machine? This seems weird as I thought that it is installed automatically when CUDA is installed.

The CUDA libraries installed with pip and the CUDA libraries installed on the system are 2 different things.

If your current Python environment contains the CUDA libraries as listed above then the technique shown in #153 (comment) should work.

If for some reasons you can't use these libraries installed with pip, then you should install CUDA 11 and cuDNN for CUDA 11 on the system, either following the installation instructions from NVIDIA or using a Docker image.

EricKong1985 · 2023-07-31T12:40:46Z

File "D:\Python310\lib\site-packages\faster_whisper\transcribe.py", line 573, in encode
return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: Library cublas64_11.dll is not found or cannot be loaded

EricKong1985 · 2023-07-31T12:41:11Z

Looks it also not support well in windows cuda12

orderer0001 · 2023-08-16T09:49:18Z

Can it be update to be compatible with cuda 12？

guillaumekln · 2023-08-16T11:39:34Z

Updating to CUDA 12 is not planned in the very short term. See an explanation here: #47 (comment).

guillaumekln closed this as completed Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel crash with Tesla P40 GPU on CUDA 12.1, but works fine on Google Colab with CUDA 12.0 #378

Kernel crash with Tesla P40 GPU on CUDA 12.1, but works fine on Google Colab with CUDA 12.0 #378

mabergerx commented Jul 25, 2023

guillaumekln commented Jul 25, 2023

mabergerx commented Jul 25, 2023

guillaumekln commented Jul 25, 2023 •

edited

Loading

EricKong1985 commented Jul 31, 2023

EricKong1985 commented Jul 31, 2023

orderer0001 commented Aug 16, 2023

guillaumekln commented Aug 16, 2023

Kernel crash with Tesla P40 GPU on CUDA 12.1, but works fine on Google Colab with CUDA 12.0 #378

Kernel crash with Tesla P40 GPU on CUDA 12.1, but works fine on Google Colab with CUDA 12.0 #378

Comments

mabergerx commented Jul 25, 2023

Environment Details:

Observations:

guillaumekln commented Jul 25, 2023

mabergerx commented Jul 25, 2023

guillaumekln commented Jul 25, 2023 • edited Loading

EricKong1985 commented Jul 31, 2023

EricKong1985 commented Jul 31, 2023

orderer0001 commented Aug 16, 2023

guillaumekln commented Aug 16, 2023

guillaumekln commented Jul 25, 2023 •

edited

Loading