Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump based CUDA image to ubuntu24.04 #1166

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DwarKapex
Copy link
Contributor

@DwarKapex DwarKapex commented Nov 22, 2024

Ubuntu24.04 uses python-3.12 as a main interpreter. Unfortunately, not all python packages, we use here, has py-3.12 wheel for amd64/arm64, so need to build the following packages from source:

  1. TF-Text
  2. Lingvo

Also python-3.12 added a system-wide protection layer (PEP 668) when install packages using pip:

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

The solution is obvious -- install everything into venv (the initial solution was proposed by @olupton).

@DwarKapex DwarKapex marked this pull request as draft November 22, 2024 06:32
@DwarKapex DwarKapex self-assigned this Nov 22, 2024
@DwarKapex DwarKapex marked this pull request as ready for review November 22, 2024 06:32
# Create a system-wide venv for the pip-installed world inside the containers
RUN python -m venv --prompt jax /opt/venv && /opt/venv/bin/pip install --ignore-installed --no-cache-dir -e /opt/pip pip-tools
# Make sure `python` refers to the venv version
ENV PATH=/opt/venv/bin:${PATH}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient instead of source /opt/venv/bin/activate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it. The search mechanism starts checking the PATH one at a time till first match. So, even if there is another python in the PATH, it picks up the first one:

$> docker run -it --rm ghcr.io/nvidia/jax-toolbox-internal:11967699553-base-amd64 bash -c "which python && which pip"
/opt/venv/bin/python
/opt/venv/bin/pip

Copy link
Collaborator

@yhtang yhtang Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/opt/venv/bin/activate performs additional actions besides appending to PATH. Are we sure it is OK to omit these in our containers?

.github/container/Dockerfile.jax Outdated Show resolved Hide resolved
@@ -2,7 +2,7 @@

ARG BASE_IMAGE=ghcr.io/nvidia/jax-mealkit:jax
ARG URLREF_MAXTEXT=https://github.com/google/maxtext.git#main
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#v2.13.0
ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this tested? It does not seem to be related to the bump to Ubuntu 24.04.

Copy link
Contributor Author

@DwarKapex DwarKapex Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, related. Google merged changes to support wheel build for python3.12 via this PR. For now we have to use master because the last release is 2.18.0 published 1 month ago and the fixing PR was merged 2 weeks ago. Waiting for 2.18+ release to ping it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this make use track master TF-Text instead of a pinned version.
We could pin a commit as an alternative.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this make use track master TF-Text instead of a pinned version. We could pin a commit as an alternative.

Our current 'standard' approach is to tracker the master in the Dockerfile, and then use manifest.yaml and the EXTRA_BUILD_ARGS mechanism for CI reproducibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DwarKapex could you please add the plumbing for lingvo/tftext version tracking following the example of

URLREF_PAXML=${{ fromJson(inputs.SOURCE_URLREFS).PAXML }}
?

.github/container/Dockerfile.pax Show resolved Hide resolved
@@ -31,7 +31,7 @@ on:
required: true
DOCKERFILE:
type: string
description: "Dockerfile to use, e.g. .github/container/Dockerfile.t5x.amd64"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why renaming all the amd64 Dockerfiles? Is there a discussion that I missed? 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between amd64 and arm64 is that we build some wheels (like lingo, tf-text) for arm64, but not for amd64. With this PR we build these packages for both archs, so no need to have 2 different Dockerfiles. I remove arm64 version of Dockerfile and extension amd64.

Copy link
Collaborator

@yhtang yhtang Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would view the manual building of lingvo/tftext wheel as a temporary remedy that (unfortunately) complicates the Dockerfile. This workaround shouldn't stay forever. We should push for proper wheel builds by upstream and at the meantime keep the simpler Dockerfiles so that they will become the default again hopefully soon.

@DwarKapex DwarKapex changed the title Vkozlov/move to ubuntu24.04 Bump based CUDA image to ubuntu24.04 Nov 22, 2024
@DwarKapex
Copy link
Contributor Author

@olupton desperately need your input here =)

@yhtang
Copy link
Collaborator

yhtang commented Nov 23, 2024

Some builds are still broken 😜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants