Device::cuda_if_available() is false #907

Jaroslove · 2024-10-31T17:46:31Z

I use tch = "0.18.0" on windows 11
Downloaded libtorch-win-shared-with-deps-2.5.0+cu118

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

My env vars are:
LIBTORCH=D:\softwares\Libtorch\libtorch-win-shared-with-deps-2.5.0+cu118\libtorch
TORCH_CUDA_VERSION=11.8
TORCH_VERSION=cu118

Could anybody help me,
what I need to do to have cuda?

Anivie · 2024-11-09T16:12:18Z

same on Ubuntu enviroment, is there any way to solve it?

vincestorm · 2024-11-09T16:24:04Z

exactly the same issue on Windows 11

LaurentMazare · 2024-11-09T16:31:15Z

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary.
Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

Anivie · 2024-11-09T16:40:29Z

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary. Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

Thanks! I tried your suggestion, but unfortunately, the problem still persists.
I checked the contents of target/debug/build/torch-sys-xxx/output and saw that it is already trying to link torch_cuda, but tch::Cuda::is_available is still false.

...
cargo:rustc-link-lib=stdc++
cargo:rustc-link-lib=static=tch
cargo:rustc-link-lib=torch_cuda
cargo:rustc-link-lib=torch_cpu
cargo:rustc-link-lib=torch
cargo:rustc-link-lib=c10
cargo:rustc-link-lib=gomp

Anivie · 2024-11-09T17:43:16Z

Well, I'm not sure if this can be the final solution, even I am still can't sure what the problem is, but it seems to have solved my problem for now.

The answer in this link worked for me.

use std::ffi::CString;
use libc::dlopen;

fn main() {
    let path = CString::new("/root/libtorch/lib/libtorch_cuda.so").unwrap();
    unsafe {
        dlopen(path.into_raw(), 1);
    }

    println!("cuda: {}", tch::Cuda::is_available());
    println!("cudnn: {}", tch::Cuda::cudnn_is_available());
}

pubfnbar · 2024-11-18T01:02:02Z

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary.

I had this same issue on Ubuntu 22.04.5, tch 0.18.0, and libtorch 2.5.0 cxx11-abi for CUDA 12.4 with rustc 1.84.0-nightly (81eef2d36 2024-11-11), i.e. all the following lines printed "false":

println!("{:?}", tch::Cuda::is_available());
println!("{:?}", tch::utils::has_cuda());
println!("{:?}", tch::utils::has_cudart());

And when I added the following build.rs to mycrate root...

fn main() {
    let os = std::env::var("CARGO_CFG_TARGET_OS").expect("Unable to get TARGET_OS");
    match os.as_str() {
        "linux" | "windows" => {
            if let Some(lib_path) = std::env::var_os("DEP_TCH_LIBTORCH_LIB") {
                println!("cargo:rustc-link-arg=-Wl,-rpath={}", lib_path.to_string_lossy());
            }
            println!("cargo:rustc-link-arg=-Wl,--no-as-needed");
            println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries");
            println!("cargo:rustc-link-arg=-ltorch");
        }
        _ => {}
    }
}

...the build failed with the following error:

error: linking with `cc` failed: exit status: 1
  |
  = note: LC_ALL="C" PATH=<paths> VSLANG="1033" "cc" "-m64" <o-file paths> "-Wl,--as-needed" "-Wl,-Bstatic" <rlib-file paths> "-Wl,-Bdynamic" "-lstdc++" "-ltorch_cuda" "-ltorch_cpu" "-ltorch" "-lc10" "-lgomp" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-B/home/myname/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/gcc-ld" "-fuse-ld=lld" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/usr/local/libtorch/lib" "-L" "mycrate/target/debug/build/torch-sys-474ddb9a018375cc/out" "-L" "mycrate/target/debug/build/bzip2-sys-06df454e922f14bd/out/lib" "-L" "mycrate/target/debug/build/zstd-sys-226893407ff59632/out" "-L" "/home/myname/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "mycrate/target/debug/deps/mycrate-4e0e8b81a3a3e1af" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-nodefaultlibs" "-Wl,--no-as-needed" "-Wl,--copy-dt-needed-entries" "-ltorch"
  = note: rust-lld: error: unknown argument '--copy-dt-needed-entries'
          collect2: error: ld returned 1 exit status

But simply removing the following line from build.rs fixed that error and now CUDA is detected by tch just fine:

println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries");

By the way, according to ld manual (which I believe lld is trying to adhere to), it seems that "--copy-dt-needed-entries" is the default option anyway, so perhaps it's not actually needed.

davencyw · 2024-11-21T15:13:24Z

Had the same issue with the same settings as @pubfnbar and his solution worked. I remember that it worked at some point without this hack with an earlier rustc version.

syl20bnr · 2024-11-27T15:30:43Z

I tried to run the examples with cargo run --example basics -r but got this error with a weird path:

Cannot query cuDNN version without ATen_cuda library.

My environment:

windows 11, powershell
pytorch manually downloaded: libtorch-win-shared-with-deps-2.5.1+cu124
PATH and LIBTORCH set
tch-rs 0.18.1 (current main)

Output:

Cuda available: false
Cudnn available: false
 3
 1
 4
 1
 5
[ CPUIntType{5} ]
-0.6448 -0.9447  1.4366  0.3457
-0.2852  0.7056  0.9534  2.4583
 0.8597  0.0024 -0.6590  0.0307
-0.5374  0.0160 -0.0363  1.3927
-0.5991  0.7843  0.9892 -0.2651
[ CPUFloatType{5,4} ]
 0.8552  0.5553  2.9366  1.8457
 1.2148  2.2056  2.4534  3.9583
 2.3597  1.5024  0.8410  1.5307
 0.9626  1.5160  1.4637  2.8927
 0.9009  2.2843  2.4892  1.2349
[ CPUFloatType{5,4} ]
 1.8552  1.5553  3.9366  2.8457
 2.2148  3.2056  3.4534  4.9583
 3.3597  2.5024  1.8410  2.5307
 1.9626  2.5160  2.4637  3.8927
 1.9009  3.2843  3.4892  2.2349
[ CPUFloatType{5,4} ]
 43.1000
 44.1000
 45.1000
[ CPUFloatType{3} ]
[3] 44.099998474121094
42
5
[ CPUDoubleType{} ]
1.24
[ CPUDoubleType{} ]
1.24
[ CPUDoubleType{} ]
Grad 1.24
has_mps: false
has_vulkan: false
thread 'main' panicked at src\wrappers\utils.rs:124:5:
called `Result::unwrap()` on an `Err` value: Torch("Cannot query cuDNN version without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason.  The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -INCLUDE:?warp_size@cuda@at@@YAHXZ in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols.  You can check if this has occurred by using link on your binary to see if there is a dependency on *_cuda.dll library.\nException raised from versionCuDNN at C:\\actions-runner\\_work\\pytorch\\pytorch\\builder\\windows\\pytorch\\aten\\src\\ATen/detail/CUDAHooksInterface.h:153 (most recent call first):\n00007FFA710883C900007FFA71088320 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]\n00007FFA71086C5A00007FFA71086C00 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]\n00007FF9F17ADDD000007FF9F17ADD80 torch_cpu.dll!at::CUDAHooksInterface::versionCuDNN [<unknown file> @ <unknown line number>]\n00007FF79380CEA000007FF79380CB20 basics.exe!c10::ivalue::Future::waitAndThrow [<unknown file> @ <unknown line number>]\n00007FF7937E3F7E <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E24BE <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E1026 <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E100C <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E93F9 <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E2EEC <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF79385777C00007FF79380CB20 basics.exe!c10::ivalue::Future::waitAndThrow [<unknown file> @ <unknown line number>]\n00007FFAC6FB259D00007FFAC6FB2580 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]\n00007FFAC7B2AF3800007FFAC7B2AF10 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]\n")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\release\examples\basics.exe` (exit code: 101)

brendanbennett · 2024-12-01T01:27:00Z

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary. Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

Copying some of the build.rs file worked for me! (I'm compiling on WSL ubuntu)

fn main() {
    if let Some(lib_path) = std::env::var_os("DEP_TCH_LIBTORCH_LIB") {
        println!("cargo:rustc-link-arg=-Wl,-rpath={}", lib_path.to_string_lossy());
    }
    println!("cargo:rustc-link-arg=-Wl,--no-as-needed");
    println!("cargo:rustc-link-arg=-ltorch");
}

Notably, this didn't work without removing the println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries"); line. I'm new to rust, is it expected to have to write/edit a build file for each project like this?

kylanoneal · 2024-12-04T17:36:27Z

@Anivie's solution worked for me on Windows 11:

use std::ffi::CString;
use std::os::raw::c_char;
use winapi::um::libloaderapi::LoadLibraryA;

fn main() {
    
    let path = CString::new("X:\\path\\to\\torch_cuda.dll").unwrap();
    
    unsafe {
        LoadLibraryA(path.as_ptr() as *const c_char);
    }

    println!("cuda: {}", tch::Cuda::is_available());
    println!("cudnn: {}", tch::Cuda::cudnn_is_available());
}

VirxEC · 2024-12-24T03:31:25Z

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary. Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

After spending a while trying to get libtorch-cxx11 + rocm 6.2 + my 6700xt working in a docker container, this made it finally work and it I can enable cuda.

Note: this is related strictly to pytorch itself, but I also had to export HSA_OVERRIDE_GFX_VERSION=10.3.0 in order for me to not get HIP error: invalid device function at runtime. My full set of env vars are:

export LIBTORCH=/root/libtorch/
export LIBTORCH_INCLUDE=/root/libtorch/
export LIBTORCH_LIB=/root/libtorch/
export LD_LIBRARY_PATH=/root/libtorch/lib/:$LD_LIBRARY_PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0

laggui mentioned this issue Nov 27, 2024

Update tch to 0.18+ tracel-ai/burn#2553

Open

Siflorite mentioned this issue Dec 27, 2024

STATUS_ENTRYPOINT_NOT_FOUND error under tch 0.18.0 #915

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device::cuda_if_available() is false #907

Device::cuda_if_available() is false #907

Jaroslove commented Oct 31, 2024

Anivie commented Nov 9, 2024

vincestorm commented Nov 9, 2024

LaurentMazare commented Nov 9, 2024

Anivie commented Nov 9, 2024 •

edited

Loading

Anivie commented Nov 9, 2024 •

edited

Loading

pubfnbar commented Nov 18, 2024 •

edited

Loading

davencyw commented Nov 21, 2024

syl20bnr commented Nov 27, 2024 •

edited

Loading

brendanbennett commented Dec 1, 2024 •

edited

Loading

kylanoneal commented Dec 4, 2024

VirxEC commented Dec 24, 2024

Device::cuda_if_available() is false #907

Device::cuda_if_available() is false #907

Comments

Jaroslove commented Oct 31, 2024

Anivie commented Nov 9, 2024

vincestorm commented Nov 9, 2024

LaurentMazare commented Nov 9, 2024

Anivie commented Nov 9, 2024 • edited Loading

Anivie commented Nov 9, 2024 • edited Loading

pubfnbar commented Nov 18, 2024 • edited Loading

davencyw commented Nov 21, 2024

syl20bnr commented Nov 27, 2024 • edited Loading

brendanbennett commented Dec 1, 2024 • edited Loading

kylanoneal commented Dec 4, 2024

VirxEC commented Dec 24, 2024

Anivie commented Nov 9, 2024 •

edited

Loading

Anivie commented Nov 9, 2024 •

edited

Loading

pubfnbar commented Nov 18, 2024 •

edited

Loading

syl20bnr commented Nov 27, 2024 •

edited

Loading

brendanbennett commented Dec 1, 2024 •

edited

Loading