-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow v2.18.0 (redux) #414
Conversation
…nda-forge-pinning 2024.11.09.23.22.20
…nda-forge-pinning 2024.11.10.22.41.52
…onda-forge-pinning 2024.11.28.16.32.05
…onda-forge-pinning 2024.11.28.20.27.28
…onda-forge-pinning 2024.12.12.09.21.41
jaxlib may need the same fix.
…onda-forge-pinning 2024.12.16.21.51.05
Github is kinda terrible about this, but I'm thinking that something is wrong with the hermetic cuda stuff: |
It might be that we want to do:
instead similar to the source I refer to |
…nda-forge-pinning 2025.02.01.03.32.47
I'm still not happy that I can't import cuda tensorflow on a system that has no CUDA capable GPU. On tensorflow 2.17, the following is blank:
I'm going to be trying with the following patch: diff --git a/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl b/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
index 528a1db7..6b280aa9 100644
--- a/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
+++ b/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
@@ -8,11 +8,6 @@ cc_import(
shared_library = "lib/libcuda.so.%{libcuda_version}",
)
-cc_import(
- name = "libcuda_so_1",
- shared_library = "lib/libcuda.so.1",
-)
-
# TODO(ybaturina): remove workaround when min CUDNN version in JAX is updated to
# 9.3.0.
# Workaround for adding path of driver library symlink to RPATH of cc_binaries.
@@ -45,7 +40,6 @@ cc_library(
%{comment}deps = [
%{comment}":libcuda_so",
%{comment}":fake_libcuda",
- %{comment}":libcuda_so_1",
%{comment}":driver_shared_library",
%{comment}],
visibility = ["//visibility:public"],
diff --git a/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl b/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
index 528a1db7..6b280aa9 100644
--- a/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
+++ b/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
@@ -8,11 +8,6 @@ cc_import(
shared_library = "lib/libcuda.so.%{libcuda_version}",
)
-cc_import(
- name = "libcuda_so_1",
- shared_library = "lib/libcuda.so.1",
-)
-
# TODO(ybaturina): remove workaround when min CUDNN version in JAX is updated to
# 9.3.0.
# Workaround for adding path of driver library symlink to RPATH of cc_binaries.
@@ -45,7 +40,6 @@ cc_library(
%{comment}deps = [
%{comment}":libcuda_so",
%{comment}":fake_libcuda",
- %{comment}":libcuda_so_1",
%{comment}":driver_shared_library",
%{comment}],
visibility = ["//visibility:public"], locally to see what happens. Update: without copying the libcuda.so.1 compilation still fails. But when you do it and inspect the so file.
There is no mention of the need for libcuda … |
@mgorny, @isuruf I don’t know if you have time to take a look at the overlinking that I beleive is happening for some parts of the tensorflow library to libcuda.so.1. I don’t think we should need to depend on cuda-compat at build time and the libraries that is failing to import should be importable even without a GPU. I’ve tried. I just don’t know what to do at this point. |
Not an issue after testing. |
I think the best way to help is to build this package locally with the existing versions, and test it on real hardware. If it fails, we could add new cuda builds, again, you can build and test locally and report your findings. |
Would it possible to build this branch right now in the current state? |
Generally there is concensus. However, the libcuda.so.1 issue makes me worried that we are overlinking everywhere which makes me hesitant. |
@hmaarrfk I am happy to announce #417 is not a concern, and the conda-forge build is resilient to this issue unlike pip. Just one thing I needed was However, I did have corruption issues while using |
The NVRTC-builtins library is indeed only intended for internal use and should not be linked to directly (discussed here in the docs). That library does not offer the same major version compatibility guarantees as the CUDA runtime does. Regarding the tensorflow linkage, I don't know Tensorflow's build system at all but I know it's using Bazel as the primary driver. I'm not sure how linkage to CUDA is handled. If there is some part of the Bazel build system that is invoking a CMake build you might be running afoul of something like https://discourse.cmake.org/t/cmake-incorrectly-links-to-nvrtc-builtins/12723, which was fixed in https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9890. If it's not using CMake but instead handling CUDA linkage directly in Bazel perhaps there is an analogous build rule that needs to be fixed to handle compatibility correctly. |
Thanks so much for the context @vyasr! I see that @hmaarrfk has already raised tensorflow/tensorflow#86413 - I guess we should backport that here? |
it is backported. Though I still can’t find the line that tries to link to libcuda.so.1 |
I haven't looked through the history of this thread to see why you need it (are you also trying to prevent linkage to the CUDA driver?), but it looks like those build rules are defined here. In particular they seem to be explicitly adding a dependency on libcuda.so.1. |
@vyasr i understand that libcuda is needed somewhere in the library, but for now it seems to be getting pulled in by some CPUlooking sections when it didn’t in 2.17 see this specific comment |
Got it. Yeah I'm afraid I'd have to familiarize myself with a lot more Bazel to be of help here. It's not clear to me what the entrypoint even is to the hermetic GPU build paths. From a quick glance it seems like most of the BUILD files use |
cc_library(
name = "cudart",
%{comment}deps = select({
%{comment}"@cuda_driver//:forward_compatibility": ["@cuda_driver//:nvidia_driver"],
%{comment}"//conditions:default": [":cuda_driver"],
%{comment}}) + [
%{comment}":cudart_shared_library",
%{comment}],
visibility = ["//visibility:public"],
) The flag cc_import(
name = "cuda_stub",
interface_library = "lib/stubs/libcuda.so",
system_provided = 1,
) |
Rebased from #412