Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix tensorflow pip install (doesn't work on any JP/L4T Versions) #760

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

gstangel
Copy link
Contributor

Build command:

user@host:~/jetson-containers/logs/20241226_221017/build$ cat ros_humble-ros-base-l4t-r36.3.0-tensorflow2.sh
#!/usr/bin/env bash

DOCKER_BUILDKIT=0 docker build --network=host --tag ros:humble-ros-base-l4t-r36.3.0-tensorflow2 \
--file /home/aero/jetson-containers/packages/ml/tensorflow/Dockerfile \
--build-arg BASE_IMAGE=ros:humble-ros-base-l4t-r36.3.0-protobuf_cpp \
--build-arg TENSORFLOW_VERSION="2.16.1" \
--build-arg TENSORFLOW_URL="https://developer.download.nvidia.com/compute/redist/jp/v60/tensorflow/tensorflow-2.16.1+nv24.06-cp310-cp310-linux_aarch64.whl" \
--build-arg TENSORFLOW_WHL="tensorflow-2.16.1+nv24.06-cp310-cp310-linux_aarch64.whl" \
--build-arg PYTHON_VERSION_MAJOR="3" \
--build-arg PYTHON_VERSION_MINOR="10" \
--build-arg FORCE_BUILD="off" \
/home/aero/jetson-containers/packages/ml/tensorflow \
2>&1 | tee /home/aero/jetson-containers/logs/20241226_221017/build/ros_humble-ros-base-l4t-r36.3.0-tensorflow2.txt; exit ${PIPESTATUS[0]}

Result:

 cat ros_humble-ros-base-l4t-r36.3.0-tensorflow2.txt
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.

Sending build context to Docker daemon  61.95kB
Step 1/5 : ARG BASE_IMAGE
Step 2/5 : FROM ${BASE_IMAGE}
 ---> 7d00558e15a1
Step 3/5 : ARG TENSORFLOW_URL     TENSORFLOW_WHL     HDF5_DIR="/usr/lib/aarch64-linux-gnu/hdf5/serial/"     MAKEFLAGS=-j$(nproc)     FORCE_BUILD
 ---> Running in 7997ed85083c
 ---> Removed intermediate container 7997ed85083c
 ---> 401a01062b21
Step 4/5 : COPY install.sh /tmp/tensorflow/
 ---> 5eb0c716e22e
Step 5/5 : RUN /tmp/tensorflow/install.sh
 ---> Running in 3fab2073b17b
+ bash /tmp/TENSORFLOW/link_cuda.sh
bash: /tmp/TENSORFLOW/link_cuda.sh: No such file or directory
The command '/bin/sh -c /tmp/tensorflow/install.sh' returned a non-zero code: 127

Which then revealed the following error in the version comparison (after resolving the previous error by copying link_cuda.sh to the Docker build)

+ apt-get clean
+ '[' off == on ']'
++ echo ' <= 2.16.1'
++ bc
(standard_in) 1: syntax error
+ '[' -eq 1 ']'
/tmp/tensorflow/install.sh: line 47: [: -eq: unary operator expected
+ pip3 install --no-cache-dir --verbose

Which then lead to a successful install/test on JP6.1 (CUDA 12.6)

root cause was that we were trying to do a version comparison using bc, which only supports decimals
move to using dpkg --compare-versions

Unknown if link_cuda.sh is needed for the install, but fixed the copy and it resulted in a successful build.
link_cuda.sh needs modifications to support different cuda versions, leaving unchanged as intended direction is not known

@johnnynunez
Copy link
Contributor

johnnynunez commented Dec 27, 2024

Build command:

user@host:~/jetson-containers/logs/20241226_221017/build$ cat ros_humble-ros-base-l4t-r36.3.0-tensorflow2.sh
#!/usr/bin/env bash

DOCKER_BUILDKIT=0 docker build --network=host --tag ros:humble-ros-base-l4t-r36.3.0-tensorflow2 \
--file /home/aero/jetson-containers/packages/ml/tensorflow/Dockerfile \
--build-arg BASE_IMAGE=ros:humble-ros-base-l4t-r36.3.0-protobuf_cpp \
--build-arg TENSORFLOW_VERSION="2.16.1" \
--build-arg TENSORFLOW_URL="https://developer.download.nvidia.com/compute/redist/jp/v60/tensorflow/tensorflow-2.16.1+nv24.06-cp310-cp310-linux_aarch64.whl" \
--build-arg TENSORFLOW_WHL="tensorflow-2.16.1+nv24.06-cp310-cp310-linux_aarch64.whl" \
--build-arg PYTHON_VERSION_MAJOR="3" \
--build-arg PYTHON_VERSION_MINOR="10" \
--build-arg FORCE_BUILD="off" \
/home/aero/jetson-containers/packages/ml/tensorflow \
2>&1 | tee /home/aero/jetson-containers/logs/20241226_221017/build/ros_humble-ros-base-l4t-r36.3.0-tensorflow2.txt; exit ${PIPESTATUS[0]}

Result:

 cat ros_humble-ros-base-l4t-r36.3.0-tensorflow2.txt
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.

Sending build context to Docker daemon  61.95kB
Step 1/5 : ARG BASE_IMAGE
Step 2/5 : FROM ${BASE_IMAGE}
 ---> 7d00558e15a1
Step 3/5 : ARG TENSORFLOW_URL     TENSORFLOW_WHL     HDF5_DIR="/usr/lib/aarch64-linux-gnu/hdf5/serial/"     MAKEFLAGS=-j$(nproc)     FORCE_BUILD
 ---> Running in 7997ed85083c
 ---> Removed intermediate container 7997ed85083c
 ---> 401a01062b21
Step 4/5 : COPY install.sh /tmp/tensorflow/
 ---> 5eb0c716e22e
Step 5/5 : RUN /tmp/tensorflow/install.sh
 ---> Running in 3fab2073b17b
+ bash /tmp/TENSORFLOW/link_cuda.sh
bash: /tmp/TENSORFLOW/link_cuda.sh: No such file or directory
The command '/bin/sh -c /tmp/tensorflow/install.sh' returned a non-zero code: 127

Which then revealed the following error in the version comparison (after resolving the previous error by copying link_cuda.sh to the Docker build)

+ apt-get clean
+ '[' off == on ']'
++ echo ' <= 2.16.1'
++ bc
(standard_in) 1: syntax error
+ '[' -eq 1 ']'
/tmp/tensorflow/install.sh: line 47: [: -eq: unary operator expected
+ pip3 install --no-cache-dir --verbose

Which then lead to a successful install/test on JP6.1 (CUDA 12.6)

root cause was that we were trying to do a version comparison using bc, which only supports decimals move to using dpkg --compare-versions

Unknown if link_cuda.sh is needed for the install, but fixed the copy and it resulted in a successful build. link_cuda.sh needs modifications to support different cuda versions, leaving unchanged as intended direction is not known

Avoid cuda link if tensorflow is >=2.18 and jax>= 0.4.34. Now tensorflow and jax use hermetic cuda, that support natively jetson

@gstangel
Copy link
Contributor Author

gstangel commented Dec 27, 2024 via email

@johnnynunez
Copy link
Contributor

What is the proper way to build tensorflow2 using jetson-containers?

On Fri, Dec 27, 2024 at 1:43 AM Johnny @.> wrote: Build command: @.:~/jetson-containers/logs/20241226_221017/build$ cat ros_humble-ros-base-l4t-r36.3.0-tensorflow2.sh #!/usr/bin/env bash DOCKER_BUILDKIT=0 docker build --network=host --tag ros:humble-ros-base-l4t-r36.3.0-tensorflow2 \ --file /home/aero/jetson-containers/packages/ml/tensorflow/Dockerfile \ --build-arg BASE_IMAGE=ros:humble-ros-base-l4t-r36.3.0-protobuf_cpp \ --build-arg TENSORFLOW_VERSION="2.16.1" \ --build-arg TENSORFLOW_URL="https://developer.download.nvidia.com/compute/redist/jp/v60/tensorflow/tensorflow-2.16.1+nv24.06-cp310-cp310-linux_aarch64.whl" \ --build-arg TENSORFLOW_WHL="tensorflow-2.16.1+nv24.06-cp310-cp310-linux_aarch64.whl" \ --build-arg PYTHON_VERSION_MAJOR="3" \ --build-arg PYTHON_VERSION_MINOR="10" \ --build-arg FORCE_BUILD="off" \ /home/aero/jetson-containers/packages/ml/tensorflow \ 2>&1 | tee /home/aero/jetson-containers/logs/20241226_221017/build/ros_humble-ros-base-l4t-r36.3.0-tensorflow2.txt; exit ${PIPESTATUS[0]} Result: cat ros_humble-ros-base-l4t-r36.3.0-tensorflow2.txt DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 61.95kB Step 1/5 : ARG BASE_IMAGE Step 2/5 : FROM ${BASE_IMAGE} ---> 7d00558e15a1 Step 3/5 : ARG TENSORFLOW_URL TENSORFLOW_WHL HDF5_DIR="/usr/lib/aarch64-linux-gnu/hdf5/serial/" MAKEFLAGS=-j$(nproc) FORCE_BUILD ---> Running in 7997ed85083c ---> Removed intermediate container 7997ed85083c ---> 401a01062b21 Step 4/5 : COPY install.sh /tmp/tensorflow/ ---> 5eb0c716e22e Step 5/5 : RUN /tmp/tensorflow/install.sh ---> Running in 3fab2073b17b + bash /tmp/TENSORFLOW/link_cuda.sh bash: /tmp/TENSORFLOW/link_cuda.sh: No such file or directory The command '/bin/sh -c /tmp/tensorflow/install.sh' returned a non-zero code: 127 Which then revealed the following error in the version comparison (after resolving the previous error by copying link_cuda.sh to the Docker build) + apt-get clean + '[' off == on ']' ++ echo ' <= 2.16.1' ++ bc (standard_in) 1: syntax error + '[' -eq 1 ']' /tmp/tensorflow/install.sh: line 47: [: -eq: unary operator expected + pip3 install --no-cache-dir --verbose Which then lead to a successful install/test on JP6.1 (CUDA 12.6) root cause was that we were trying to do a version comparison using bc, which only supports decimals move to using dpkg --compare-versions Unknown if link_cuda.sh is needed for the install, but fixed the copy and it resulted in a successful build. link_cuda.sh needs modifications to support different cuda versions, leaving unchanged as intended direction is not known Avoid cuda link. Now tensorflow and jax use hermetic cuda, that support natively jetson — Reply to this email directly, view it on GitHub <#760 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN7VQMEFKIWDUETEQW5SPJT2HUAJZAVCNFSM6AAAAABUH2PAVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRTGQZDANZWGI . You are receiving this because you authored the thread.Message ID: @.***>

Now, it is only supported for jetpack >=6
For other jetpack, nvidia has wheels on their pages

@gstangel
Copy link
Contributor Author

@johnnynunez
I understand, this project is a huge undertaking. Are there plans to get it up to date w/ jetpack 6.1? It seems like there needs to be some optimizations around using prebuilt wheels instead of building from source for all jetpack versions

@johnnynunez
Copy link
Contributor

johnnynunez commented Dec 27, 2024

@johnnynunez I understand, this project is a huge undertaking. Are there plans to get it up to date w/ jetpack 6.1? It seems like there needs to be some optimizations around using prebuilt wheels instead of building from source for all jetpack versions

it's updated with jetpack 6.1
https://pypi.jetson-ai-lab.dev/jp6/cu126
image

@gstangel
Copy link
Contributor Author

sorry, I meant jetson-containers as a whole

@dusty-nv
Copy link
Owner

dusty-nv commented Dec 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants