tensorflow v2.18.0 (redux) #414

h-vetinari · 2025-01-21T08:15:37Z

Rebased from #412

…nda-forge-pinning 2024.11.09.23.22.20

…nda-forge-pinning 2024.11.10.22.41.52

…onda-forge-pinning 2024.11.28.16.32.05

…onda-forge-pinning 2024.11.28.20.27.28

…onda-forge-pinning 2024.12.12.09.21.41

jaxlib may need the same fix.

…onda-forge-pinning 2024.12.16.21.51.05

hmaarrfk · 2025-02-02T00:33:30Z

Github is kinda terrible about this, but I'm thinking that something is wrong with the hermetic cuda stuff:
#414 (comment)

hmaarrfk · 2025-02-02T05:26:12Z

It might be that we want to do:

# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/tf_sig_build_dockerfiles/setup.cuda.sh#L28
ln -sf ${BUILD_PREFIX}/targets/x86_64-linux/lib/stubs/libcuda.so ${PREFIX}/lib/libcuda.so.1

instead similar to the source I refer to

…nda-forge-pinning 2025.02.01.03.32.47

recipe/meta.yaml

recipe/build.sh

hmaarrfk · 2025-02-03T01:51:57Z

I'm still not happy that I can't import cuda tensorflow on a system that has no CUDA capable GPU.

On tensorflow 2.17, the following is blank:

ldd ${CONDA_PREFIX}/lib/python3.1/site-packages/tensorflow/python/platform/_pywrap_cpu_feature_guard.so  | grep libcuda

I'm going to be trying with the following patch:

diff --git a/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl b/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
index 528a1db7..6b280aa9 100644
--- a/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
+++ b/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
@@ -8,11 +8,6 @@ cc_import(
     shared_library = "lib/libcuda.so.%{libcuda_version}",
 )

-cc_import(
-    name = "libcuda_so_1",
-    shared_library = "lib/libcuda.so.1",
-)
-
 # TODO(ybaturina): remove workaround when min CUDNN version in JAX is updated to
 # 9.3.0.
 # Workaround for adding path of driver library symlink to RPATH of cc_binaries.
@@ -45,7 +40,6 @@ cc_library(
     %{comment}deps = [
         %{comment}":libcuda_so",
         %{comment}":fake_libcuda",
-        %{comment}":libcuda_so_1",
         %{comment}":driver_shared_library",
     %{comment}],
     visibility = ["//visibility:public"],
diff --git a/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl b/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
index 528a1db7..6b280aa9 100644
--- a/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
+++ b/third_party/xla/third_party/tsl/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
@@ -8,11 +8,6 @@ cc_import(
     shared_library = "lib/libcuda.so.%{libcuda_version}",
 )

-cc_import(
-    name = "libcuda_so_1",
-    shared_library = "lib/libcuda.so.1",
-)
-
 # TODO(ybaturina): remove workaround when min CUDNN version in JAX is updated to
 # 9.3.0.
 # Workaround for adding path of driver library symlink to RPATH of cc_binaries.
@@ -45,7 +40,6 @@ cc_library(
     %{comment}deps = [
         %{comment}":libcuda_so",
         %{comment}":fake_libcuda",
-        %{comment}":libcuda_so_1",
         %{comment}":driver_shared_library",
     %{comment}],
     visibility = ["//visibility:public"],

locally to see what happens.

Update: without copying the libcuda.so.1 compilation still fails. But when you do it and inspect the so file.

objdump -T _pywrap_cpu_feature_guard.so

There is no mention of the need for libcuda …

hmaarrfk · 2025-02-03T14:17:30Z

@mgorny, @isuruf I don’t know if you have time to take a look at the overlinking that I beleive is happening for some parts of the tensorflow library to libcuda.so.1.

I don’t think we should need to depend on cuda-compat at build time and the libraries that is failing to import should be importable even without a GPU.

I’ve tried. I just don’t know what to do at this point.

ehfd · 2025-02-04T13:03:09Z

~~I suggest reading #417, as this has major implications on CUDA dependency. Please assist in proving otherwise on an older driver version if this issue doesn't exist (somehow) in conda-forge.~~

Not an issue after testing.

hmaarrfk · 2025-02-04T13:42:22Z

Please prove otherwise on an older driver version if this issue doesn't exist (somehow) in conda-forge.

I think the best way to help is to build this package locally with the existing versions, and test it on real hardware. If it fails, we could add new cuda builds, again, you can build and test locally and report your findings.

ehfd · 2025-02-04T13:52:25Z

Would it possible to build this branch right now in the current state?

hmaarrfk · 2025-02-04T13:54:09Z

Would it possible to build this branch right now in the current state?

Generally there is concensus. However, the libcuda.so.1 issue makes me worried that we are overlinking everywhere which makes me hesitant.

ehfd · 2025-02-05T12:33:28Z

@hmaarrfk I am happy to announce #417 is not a concern, and the conda-forge build is resilient to this issue unlike pip.

Just one thing I needed was os.environ["XLA_FLAGS"] = "--xla_gpu_cuda_data_dir=" + os.environ["CONDA_PREFIX"].

However, I did have corruption issues while using keras_hub.layers.TransformerEncoder, installed with pip after installing TensorFlow and Keras with conda. This merely justifies the need to build a keras-hub-feedstock (conda-forge/staged-recipes#28970) and a tensorflow-text-feedstock (conda-forge/staged-recipes#29008, mandatory dependency to keras-hub).

vyasr · 2025-02-05T19:04:49Z

  File "$SP_DIR/tensorflow/python/platform/self_check.py", line 63, in preload_check
    from tensorflow.python.platform import _pywrap_cpu_feature_guard
ImportError: libnvrtc-builtins.so.12.6: cannot open shared object file: No such file or directory
Yeah, that .so.12.6 indicates that tensorflow is coupled so tightly to the CUDA version used to build it, that we don't actually have forward compatibility (i.e. . Not sure if this is expected @conda-forge/cuda?

Looking at the content of cuda-nvrtc, it's apparent that the builtins don't have major-level compatibility:
lib/libnvrtc-builtins.so.12.6
lib/libnvrtc-builtins.so.12.6.85
lib/libnvrtc.so.12
lib/libnvrtc.so.12.6.85
targets/x86_64-linux/lib/libnvrtc-builtins.so.12.6
targets/x86_64-linux/lib/libnvrtc-builtins.so.12.6.85
targets/x86_64-linux/lib/libnvrtc.so.12
targets/x86_64-linux/lib/libnvrtc.so.12.6.85
Perhaps it's unexpected that tensorflow links to libnvrtc-builtin.so directly? 🤔

In any case, it seems we should add a run-constraint on cuda-version {{ cuda_compiler_version }} for now.

The NVRTC-builtins library is indeed only intended for internal use and should not be linked to directly (discussed here in the docs). That library does not offer the same major version compatibility guarantees as the CUDA runtime does.

Regarding the tensorflow linkage, I don't know Tensorflow's build system at all but I know it's using Bazel as the primary driver. I'm not sure how linkage to CUDA is handled. If there is some part of the Bazel build system that is invoking a CMake build you might be running afoul of something like https://discourse.cmake.org/t/cmake-incorrectly-links-to-nvrtc-builtins/12723, which was fixed in https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9890. If it's not using CMake but instead handling CUDA linkage directly in Bazel perhaps there is an analogous build rule that needs to be fixed to handle compatibility correctly.

h-vetinari · 2025-02-05T21:45:53Z

Thanks so much for the context @vyasr! I see that @hmaarrfk has already raised tensorflow/tensorflow#86413 - I guess we should backport that here?

hmaarrfk · 2025-02-05T21:52:43Z

I guess we should backport that here?

it is backported. Though I still can’t find the line that tries to link to libcuda.so.1

vyasr · 2025-02-06T01:47:13Z

I guess we should backport that here?

it is backported. Though I still can’t find the line that tries to link to libcuda.so.1

I haven't looked through the history of this thread to see why you need it (are you also trying to prevent linkage to the CUDA driver?), but it looks like those build rules are defined here. In particular they seem to be explicitly adding a dependency on libcuda.so.1.

hmaarrfk · 2025-02-06T02:22:15Z

@vyasr i understand that libcuda is needed somewhere in the library, but for now it seems to be getting pulled in by some CPUlooking sections when it didn’t in 2.17

#414 (comment)

see this specific comment

vyasr · 2025-02-06T02:57:50Z

Got it. Yeah I'm afraid I'd have to familiarize myself with a lot more Bazel to be of help here. It's not clear to me what the entrypoint even is to the hermetic GPU build paths. From a quick glance it seems like most of the BUILD files use tf_cuda_library, which is defined here with the underlying command coming from xla but I don't immediately see how that would lead to the cuda build rules defined in the top-level thirdparty directory.

njzjz · 2025-02-07T08:57:37Z

I'm going to be trying with the following patch:

diff --git a/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl b/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
index 528a1db7..6b280aa9 100644
--- a/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl
+++ b/third_party/gpus/cuda/hermetic/cuda_driver.BUILD.tpl

cuda_driver.BUILD.tpl may not be used, see https://github.com/tensorflow/tensorflow/blob/6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/third_party/gpus/cuda/hermetic/cuda_cudart.BUILD.tpl#L31-L40

cc_library(
    name = "cudart",
    %{comment}deps = select({
        %{comment}"@cuda_driver//:forward_compatibility": ["@cuda_driver//:nvidia_driver"],
        %{comment}"//conditions:default": [":cuda_driver"],
    %{comment}}) + [
        %{comment}":cudart_shared_library",
    %{comment}],
    visibility = ["//visibility:public"],
)

The flag @cuda_driver//:forward_compatibility is false by default and is not enabled. So, it may use cuda_stub in cuda_cudart.BUILD.tpl:

cc_import(
    name = "cuda_stub",
    interface_library = "lib/stubs/libcuda.so",
    system_provided = 1,
)

regro-cf-autotick-bot and others added 28 commits October 25, 2024 04:08

updated v2.18.0

ad9c1a7

Rebuild for libgrpc 1.67 & libprotobuf 5.28.2

e3ea41a

MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.44.2, and co…

b2fcf0d

…nda-forge-pinning 2024.11.09.23.22.20

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.2, and co…

3fa46b4

…nda-forge-pinning 2024.11.10.22.41.52

Add numpy 2

581974c

UPdate patches

a17b317

noaws

10d7755

uwe's patches

70fb4d9

Update ml_dtypes

db1e370

ml dtypes update again

a44af1e

MNT: Re-rendered with conda-build 24.11.1, conda-smithy 3.44.8, and c…

bef6c2b

…onda-forge-pinning 2024.11.28.16.32.05

Update version pins

b10302a

Stick to clang17

d77bdcf

MNT: Re-rendered with conda-build 24.11.1, conda-smithy 3.44.8, and c…

2a4c2e2

…onda-forge-pinning 2024.11.28.20.27.28

Fix ml_dtypes dependencies

8e1670f

Fix ResolvePadding

d55b271

Support hermetic CUDA

cdc23a2

MNT: Re-rendered with conda-build 24.11.2, conda-smithy 3.44.9, and c…

50a3c44

…onda-forge-pinning 2024.12.12.09.21.41

use cuda-compat

65d1268

Update LOCAL_CUDNN_PATH and LOCAL_NCCL_PATH

bf8ff52

jaxlib may need the same fix.

fix more CUDA issues

3107f07

use the GPU machine

8f7059f

MNT: Re-rendered with conda-build 24.11.2, conda-smithy 3.45.0, and c…

9eb9209

…onda-forge-pinning 2024.12.16.21.51.05

Whitespace in build.sh

4e93d58

Try to address missing libcuda.so.1

4a78d66

Add cuda-compat to estimator as well

ab0b532

Add flatbuffers

fac070f

also remove acceleration

aac4613

h-vetinari requested review from FarhanTejani and ghego as code owners January 21, 2025 08:15

hmaarrfk added 3 commits February 2, 2025 08:35

Add one more patch

55fdc8d

Add a few skips for speed

d3c768e

MNT: Re-rendered with conda-build 25.1.1, conda-smithy 3.45.4, and co…

ecfbcc1

…nda-forge-pinning 2025.02.01.03.32.47

hmaarrfk reviewed Feb 2, 2025

View reviewed changes

recipe/meta.yaml Show resolved Hide resolved

hmaarrfk reviewed Feb 2, 2025

View reviewed changes

recipe/build.sh Show resolved Hide resolved

Use the stub

cf2f56f

hmaarrfk mentioned this pull request Feb 3, 2025

nvrtc-builtins are private don't link to them tensorflow/tensorflow#86413

Open

add note to upstream patch

4411337

ehfd mentioned this pull request Feb 5, 2025

Building on CUDA 12.6 likely has issues on driver versions older than 565 #417

Closed

1 task

hmaarrfk mentioned this pull request Feb 6, 2025

tf-keras v2.18.0 conda-forge/tf-keras-feedstock#5

Closed

3 tasks

njzjz mentioned this pull request Feb 8, 2025

2.18 remove libcuda.so from dependencies #418

Merged

5 tasks

h-vetinari merged commit 4411337 into conda-forge:main Feb 9, 2025
4 checks passed

h-vetinari deleted the 2.18 branch February 9, 2025 10:53

h-vetinari mentioned this pull request Feb 10, 2025

Missing OSX support for 2.18 #419

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow v2.18.0 (redux) #414

tensorflow v2.18.0 (redux) #414

h-vetinari commented Jan 21, 2025

hmaarrfk commented Feb 2, 2025

hmaarrfk commented Feb 2, 2025

hmaarrfk commented Feb 3, 2025 •

edited

Loading

hmaarrfk commented Feb 3, 2025 •

edited

Loading

ehfd commented Feb 4, 2025 •

edited

Loading

hmaarrfk commented Feb 4, 2025

ehfd commented Feb 4, 2025

hmaarrfk commented Feb 4, 2025

ehfd commented Feb 5, 2025 •

edited

Loading

vyasr commented Feb 5, 2025 •

edited

Loading

h-vetinari commented Feb 5, 2025

hmaarrfk commented Feb 5, 2025

vyasr commented Feb 6, 2025

hmaarrfk commented Feb 6, 2025

vyasr commented Feb 6, 2025

njzjz commented Feb 7, 2025 •

edited

Loading

tensorflow v2.18.0 (redux) #414

tensorflow v2.18.0 (redux) #414

Conversation

h-vetinari commented Jan 21, 2025

hmaarrfk commented Feb 2, 2025

hmaarrfk commented Feb 2, 2025

hmaarrfk commented Feb 3, 2025 • edited Loading

hmaarrfk commented Feb 3, 2025 • edited Loading

ehfd commented Feb 4, 2025 • edited Loading

hmaarrfk commented Feb 4, 2025

ehfd commented Feb 4, 2025

hmaarrfk commented Feb 4, 2025

ehfd commented Feb 5, 2025 • edited Loading

vyasr commented Feb 5, 2025 • edited Loading

h-vetinari commented Feb 5, 2025

hmaarrfk commented Feb 5, 2025

vyasr commented Feb 6, 2025

hmaarrfk commented Feb 6, 2025

vyasr commented Feb 6, 2025

njzjz commented Feb 7, 2025 • edited Loading

hmaarrfk commented Feb 3, 2025 •

edited

Loading

hmaarrfk commented Feb 3, 2025 •

edited

Loading

ehfd commented Feb 4, 2025 •

edited

Loading

ehfd commented Feb 5, 2025 •

edited

Loading

vyasr commented Feb 5, 2025 •

edited

Loading

njzjz commented Feb 7, 2025 •

edited

Loading