Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device::cuda_if_available() is false #907

Open
Jaroslove opened this issue Oct 31, 2024 · 11 comments
Open

Device::cuda_if_available() is false #907

Jaroslove opened this issue Oct 31, 2024 · 11 comments

Comments

@Jaroslove
Copy link

I use tch = "0.18.0" on windows 11
Downloaded libtorch-win-shared-with-deps-2.5.0+cu118

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

My env vars are:
LIBTORCH=D:\softwares\Libtorch\libtorch-win-shared-with-deps-2.5.0+cu118\libtorch
TORCH_CUDA_VERSION=11.8
TORCH_VERSION=cu118

Could anybody help me,
what I need to do to have cuda?

@Anivie
Copy link

Anivie commented Nov 9, 2024

same on Ubuntu enviroment, is there any way to solve it?

@vincestorm
Copy link

exactly the same issue on Windows 11

@LaurentMazare
Copy link
Owner

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary.
Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

@Anivie
Copy link

Anivie commented Nov 9, 2024

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary. Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

Thanks! I tried your suggestion, but unfortunately, the problem still persists.
I checked the contents of target/debug/build/torch-sys-xxx/output and saw that it is already trying to link torch_cuda, but tch::Cuda::is_available is still false.

...
cargo:rustc-link-lib=stdc++
cargo:rustc-link-lib=static=tch
cargo:rustc-link-lib=torch_cuda
cargo:rustc-link-lib=torch_cpu
cargo:rustc-link-lib=torch
cargo:rustc-link-lib=c10
cargo:rustc-link-lib=gomp

@Anivie
Copy link

Anivie commented Nov 9, 2024

Well, I'm not sure if this can be the final solution, even I am still can't sure what the problem is, but it seems to have solved my problem for now.

The answer in this link worked for me.

use std::ffi::CString;
use libc::dlopen;

fn main() {
    let path = CString::new("/root/libtorch/lib/libtorch_cuda.so").unwrap();
    unsafe {
        dlopen(path.into_raw(), 1);
    }

    println!("cuda: {}", tch::Cuda::is_available());
    println!("cudnn: {}", tch::Cuda::cudnn_is_available());
}

@pubfnbar
Copy link

pubfnbar commented Nov 18, 2024

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary.

I had this same issue on Ubuntu 22.04.5, tch 0.18.0, and libtorch 2.5.0 cxx11-abi for CUDA 12.4 with rustc 1.84.0-nightly (81eef2d36 2024-11-11), i.e. all the following lines printed "false":

println!("{:?}", tch::Cuda::is_available());
println!("{:?}", tch::utils::has_cuda());
println!("{:?}", tch::utils::has_cudart());

And when I added the following build.rs to mycrate root...

fn main() {
    let os = std::env::var("CARGO_CFG_TARGET_OS").expect("Unable to get TARGET_OS");
    match os.as_str() {
        "linux" | "windows" => {
            if let Some(lib_path) = std::env::var_os("DEP_TCH_LIBTORCH_LIB") {
                println!("cargo:rustc-link-arg=-Wl,-rpath={}", lib_path.to_string_lossy());
            }
            println!("cargo:rustc-link-arg=-Wl,--no-as-needed");
            println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries");
            println!("cargo:rustc-link-arg=-ltorch");
        }
        _ => {}
    }
}

...the build failed with the following error:

error: linking with `cc` failed: exit status: 1
  |
  = note: LC_ALL="C" PATH=<paths> VSLANG="1033" "cc" "-m64" <o-file paths> "-Wl,--as-needed" "-Wl,-Bstatic" <rlib-file paths> "-Wl,-Bdynamic" "-lstdc++" "-ltorch_cuda" "-ltorch_cpu" "-ltorch" "-lc10" "-lgomp" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-B/home/myname/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/gcc-ld" "-fuse-ld=lld" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/usr/local/libtorch/lib" "-L" "mycrate/target/debug/build/torch-sys-474ddb9a018375cc/out" "-L" "mycrate/target/debug/build/bzip2-sys-06df454e922f14bd/out/lib" "-L" "mycrate/target/debug/build/zstd-sys-226893407ff59632/out" "-L" "/home/myname/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "mycrate/target/debug/deps/mycrate-4e0e8b81a3a3e1af" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-nodefaultlibs" "-Wl,--no-as-needed" "-Wl,--copy-dt-needed-entries" "-ltorch"
  = note: rust-lld: error: unknown argument '--copy-dt-needed-entries'
          collect2: error: ld returned 1 exit status

But simply removing the following line from build.rs fixed that error and now CUDA is detected by tch just fine:

println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries");

By the way, according to ld manual (which I believe lld is trying to adhere to), it seems that "--copy-dt-needed-entries" is the default option anyway, so perhaps it's not actually needed.

@davencyw
Copy link

Had the same issue with the same settings as @pubfnbar and his solution worked. I remember that it worked at some point without this hack with an earlier rustc version.

@syl20bnr
Copy link

syl20bnr commented Nov 27, 2024

I tried to run the examples with cargo run --example basics -r but got this error with a weird path:

Cannot query cuDNN version without ATen_cuda library.

My environment:

  • windows 11, powershell
  • pytorch manually downloaded: libtorch-win-shared-with-deps-2.5.1+cu124
  • PATH and LIBTORCH set
  • tch-rs 0.18.1 (current main)

Output:

Cuda available: false
Cudnn available: false
 3
 1
 4
 1
 5
[ CPUIntType{5} ]
-0.6448 -0.9447  1.4366  0.3457
-0.2852  0.7056  0.9534  2.4583
 0.8597  0.0024 -0.6590  0.0307
-0.5374  0.0160 -0.0363  1.3927
-0.5991  0.7843  0.9892 -0.2651
[ CPUFloatType{5,4} ]
 0.8552  0.5553  2.9366  1.8457
 1.2148  2.2056  2.4534  3.9583
 2.3597  1.5024  0.8410  1.5307
 0.9626  1.5160  1.4637  2.8927
 0.9009  2.2843  2.4892  1.2349
[ CPUFloatType{5,4} ]
 1.8552  1.5553  3.9366  2.8457
 2.2148  3.2056  3.4534  4.9583
 3.3597  2.5024  1.8410  2.5307
 1.9626  2.5160  2.4637  3.8927
 1.9009  3.2843  3.4892  2.2349
[ CPUFloatType{5,4} ]
 43.1000
 44.1000
 45.1000
[ CPUFloatType{3} ]
[3] 44.099998474121094
42
5
[ CPUDoubleType{} ]
1.24
[ CPUDoubleType{} ]
1.24
[ CPUDoubleType{} ]
Grad 1.24
has_mps: false
has_vulkan: false
thread 'main' panicked at src\wrappers\utils.rs:124:5:
called `Result::unwrap()` on an `Err` value: Torch("Cannot query cuDNN version without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason.  The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -INCLUDE:?warp_size@cuda@at@@YAHXZ in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols.  You can check if this has occurred by using link on your binary to see if there is a dependency on *_cuda.dll library.\nException raised from versionCuDNN at C:\\actions-runner\\_work\\pytorch\\pytorch\\builder\\windows\\pytorch\\aten\\src\\ATen/detail/CUDAHooksInterface.h:153 (most recent call first):\n00007FFA710883C900007FFA71088320 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]\n00007FFA71086C5A00007FFA71086C00 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]\n00007FF9F17ADDD000007FF9F17ADD80 torch_cpu.dll!at::CUDAHooksInterface::versionCuDNN [<unknown file> @ <unknown line number>]\n00007FF79380CEA000007FF79380CB20 basics.exe!c10::ivalue::Future::waitAndThrow [<unknown file> @ <unknown line number>]\n00007FF7937E3F7E <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E24BE <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E1026 <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E100C <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E93F9 <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF7937E2EEC <unknown symbol address> basics.exe!<unknown symbol> [<unknown file> @ <unknown line number>]\n00007FF79385777C00007FF79380CB20 basics.exe!c10::ivalue::Future::waitAndThrow [<unknown file> @ <unknown line number>]\n00007FFAC6FB259D00007FFAC6FB2580 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]\n00007FFAC7B2AF3800007FFAC7B2AF10 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]\n")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\release\examples\basics.exe` (exit code: 101)

@brendanbennett
Copy link

brendanbennett commented Dec 1, 2024

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary. Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

Copying some of the build.rs file worked for me! (I'm compiling on WSL ubuntu)

fn main() {
    if let Some(lib_path) = std::env::var_os("DEP_TCH_LIBTORCH_LIB") {
        println!("cargo:rustc-link-arg=-Wl,-rpath={}", lib_path.to_string_lossy());
    }
    println!("cargo:rustc-link-arg=-Wl,--no-as-needed");
    println!("cargo:rustc-link-arg=-ltorch");
}

Notably, this didn't work without removing the println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries"); line. I'm new to rust, is it expected to have to write/edit a build file for each project like this?

@kylanoneal
Copy link

@Anivie's solution worked for me on Windows 11:

use std::ffi::CString;
use std::os::raw::c_char;
use winapi::um::libloaderapi::LoadLibraryA;

fn main() {
    
    let path = CString::new("X:\\path\\to\\torch_cuda.dll").unwrap();
    
    unsafe {
        LoadLibraryA(path.as_ptr() as *const c_char);
    }

    println!("cuda: {}", tch::Cuda::is_available());
    println!("cudnn: {}", tch::Cuda::cudnn_is_available());
}

@VirxEC
Copy link

VirxEC commented Dec 24, 2024

Ensure that you have a build.rs script in your repo that goes along the lines of the ones in tch-rs here, this ensures that the proper flags are used when linking the binary. Also good if you can checkin the tch-rs repro and run cargo run --example basics -r this should print whether or not the cuda bits are found in this case.

After spending a while trying to get libtorch-cxx11 + rocm 6.2 + my 6700xt working in a docker container, this made it finally work and it I can enable cuda.

Note: this is related strictly to pytorch itself, but I also had to export HSA_OVERRIDE_GFX_VERSION=10.3.0 in order for me to not get HIP error: invalid device function at runtime. My full set of env vars are:

export LIBTORCH=/root/libtorch/
export LIBTORCH_INCLUDE=/root/libtorch/
export LIBTORCH_LIB=/root/libtorch/
export LD_LIBRARY_PATH=/root/libtorch/lib/:$LD_LIBRARY_PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants