Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eclipse-temurin: add support for riscv64 #17403

Merged
merged 2 commits into from
Aug 23, 2024
Merged

Conversation

gdams
Copy link
Contributor

@gdams gdams commented Aug 21, 2024

Also has some minor reformatting changes to the entrypoint.sh

For now we're only adding Ubuntu Noble support, we could add previous Ubuntu versions if there was enough interest.

I'm not sure why the diff comment is failing

also has some minor reformatting changes to the entrypoint.sh
@gdams
Copy link
Contributor Author

gdams commented Aug 21, 2024

I've put the diff in a gist as I think it's too long for a comment https://gist.github.com/gdams/de1eec781fc16585276553c7685ff6c8

@yosifkit
Copy link
Member

Summary of the changes that I see in the diff. Hopefully I didn't miss something.

  1. riscv64 added to architectures with Dockerfile changes to support it 👍
  2. sh -> bash: hopefully this'll re-fix (and prevent issues for) those using dotted environment variables
  3. entrypoint.sh templating (removal of unused blocks per variant): This seems good.
  4. boilerplate Apache 2.0 license headers: No problems here. 👍
  5. ENV space -> =: buildkit linting-induced churn adding to the noise (LegacyKeyValueFormat linter is noisy with very little practical value moby/buildkit#5130).
  6. executable bit removed from git for entrypoint.sh: This with the next are a bit confusing. What issue is solved by changing from one to the other? Has something gone wrong with tracking executability via git that the --chmod via COPY was the better solution?
  7. executable bit added to entrypoint.sh via COPY (--chmod)

So, just a question on the executability bit. ⬆️

@gdams
Copy link
Contributor Author

gdams commented Aug 22, 2024

Thanks for the summary @yosifkit. That's correct, at one point we broke an update because the bits changed. We figured that now that build kit supports the chmod flag in the COPY command it made the most sense to explicitly define the permissions there. Are we doing this correctly?

@yosifkit
Copy link
Member

Are we doing this correctly?

It looks accurate, just a bit unusual is all.

Newest change is just adoptium/containers#636. 👍 Everything seems fine to me.

@yosifkit yosifkit merged commit fef0864 into docker-library:master Aug 23, 2024
70 of 71 checks passed
@gdams
Copy link
Contributor Author

gdams commented Aug 24, 2024

@yosifkit looks like CI is failing: https://doi-janky.infosiftr.net/job/multiarch/job/riscv64/job/eclipse-temurin/

[0.024s][error][os] Syscall: RISCV_FLUSH_ICACHE not available; error='Operation not permitted' (errno=EPERM)

This is the error caused by moby/moby#44319 (comment) which we've managed to work around locally by starting the container with --security-opt seccomp=unconfined

@gdams gdams deleted the patch-5 branch August 24, 2024 10:05
@gdams
Copy link
Contributor Author

gdams commented Aug 24, 2024

CC @luhenry who was looking at this recently for Adoptium

@yosifkit
Copy link
Member

🤔 That's super strange as we are running new enough versions of both Docker and Buildkit to have the updated seccomp profile. We'll keep looking on our side to see what we can figure out.

https://github.com/moby/moby/blob/v23.0.11/profiles/seccomp/default.json#L575-L585
https://github.com/moby/buildkit/blob/v0.13.2/vendor/github.com/docker/docker/profiles/seccomp/default.json#L572-L582

@tianon
Copy link
Member

tianon commented Aug 26, 2024

I tested with Docker v26 from Debian unstable and it also fails with the same error (and libseccomp2 is extremely up-to-date 😂), so something really suspicious is happening here, because as far as I can tell, the profile looks right. 🤔

@gdams
Copy link
Contributor Author

gdams commented Aug 27, 2024

For what it's worth, it builds okay on my Mac using buildx specifying riscv64 as the platform

@gdams
Copy link
Contributor Author

gdams commented Aug 30, 2024

For what it's worth, it builds okay on my Mac using buildx specifying riscv64 as the platform

@yosifkit @tianon is there a way we could build the image on an intel host rather than a riscv64 one?

 docker buildx build --platform linux/riscv64 -t foo .
[+] Building 29.8s (11/11) FINISHED                                                                                                                    docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                   0.0s
 => => transferring dockerfile: 5.14kB                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/ubuntu:24.04                                                                                                        0.8s
 => [auth] library/ubuntu:pull token for registry-1.docker.io                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                        0.0s
 => [1/5] FROM docker.io/library/ubuntu:24.04@sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee                                                  0.0s
 => [internal] load build context                                                                                                                                      0.0s
 => => transferring context: 4.78kB                                                                                                                                    0.0s
 => CACHED [2/5] RUN set -eux;     apt-get update;     DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends         curl         wget         fo  0.0s
 => [3/5] RUN set -eux;     ARCH="$(dpkg --print-architecture)";     case "${ARCH}" in        amd64)          ESUM='51fb4d03a4429c39d397d3a03a779077159317616550e4e7  16.5s
 => [4/5] RUN set -eux;     echo "Verifying install ...";     fileEncoding="$(echo 'System.out.println(System.getProperty("file.encoding"))' | jshell -s -)"; [ "$fi  12.0s
 => [5/5] COPY --chmod=755 entrypoint.sh /__cacert_entrypoint.sh                                                                                                       0.0s 
 => exporting to image                                                                                                                                                 0.4s 
 => => exporting layers                                                                                                                                                0.4s 
 => => writing image sha256:f6b1b687a25cb0316159ca4fa4d64502b7805d828713ff93c9e8b032a33e09f3                                                                           0.0s 
 => => naming to docker.io/library/foo    
docker run --platform linux/riscv64 foo java -version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, sharing)

@yosifkit
Copy link
Member

is there a way we could build the image on an intel host rather than a riscv64 one?

Unfortunately, we cannot; none of our build hosts are setup with qemu user mode emulation and our build orchestration cannot handle scheduling a "cross" build like this.

@sxa
Copy link
Contributor

sxa commented Sep 2, 2024

The simplest bypass for this for now would be to avoid running java i.e. exclude the -Xshare:dump which re-populates the shared class cache in the docker environment. We should be safe without it although it may have an impact on startup performance. I'll be honest I wasn't aware we were doing the cache regeneration until now ... but I'll note that it likely means the shared class cache in the docker image isn't the same as the ones in our tarballs ...

The 'verification' step after that in the Dockerfiles also try tun run java -version which I'd expect to suffer a similar fate if we switch off the cache generation, so maybe we need to bypass that on riscv64 for now too if we can't run in another environment. I'm not sure if any further tests are invoked after the dockerfile is run that might trip up on this too.

We do need to be clear that end-users will hit this too if they run the containers without the seccomp (or --privileged) options.

@tianon
Copy link
Member

tianon commented Sep 3, 2024

We do need to be clear that end-users will hit this too if they run the containers without the seccomp (or --privileged) options.

Yeah, this is the part I'm most worried about -- as far as I can tell, this is "fixed" in all the seccomp profiles and fully supported in libseccomp, so I'm not even sure why it's failing or how the profile would need to change to make it succeed.

Generally, we shouldn't have users running with --privileged, and --security-opt seccomp=unconfined is a slightly better workaround, although still not ideal.

@gdams
Copy link
Contributor Author

gdams commented Sep 9, 2024

I've done a bit of digging here and discovered that the issue seems to be with the architecture detection in the seccomp profile.

running with this will fail:

		{
			"names": [
				"riscv_flush_icache"
			],
			"action": "SCMP_ACT_ALLOW",
			"includes": {
				"arches": [
					"riscv64"
				]
			}
		},

but if you remove the includes it will work:

		{
			"names": [
				"riscv_flush_icache"
			],
			"action": "SCMP_ACT_ALLOW"
		},

I'm not a seccomp expert so I might have to delegate to @tianon or @yosifkit but this does seem to be a good starting point. I did try playing around with the arches values such as riscv but to no avail. I've checked that the arches value maps to uname -m which it does so I'm not sure where the seccomp actually gets it's architecture value from.

@yosifkit
Copy link
Member

yosifkit commented Sep 9, 2024

Ah, that gives me a clue as to where the problem might be. Let me go check moby/moby.

@gdams
Copy link
Contributor Author

gdams commented Sep 9, 2024

@yosifkit
Copy link
Member

yosifkit commented Sep 9, 2024

I was looking at the same spot; it doesn't have a mapping for riscv64 and I think that's the problem. Can you file a bug in moby/moby? (If you want to make a PR too, I wouldn't say no. ❤️)

@gdams
Copy link
Contributor Author

gdams commented Sep 9, 2024

@yosifkit see:

moby/moby#48454 and moby/moby#48455

@gdams
Copy link
Contributor Author

gdams commented Sep 9, 2024

@yosifkit will I need to raise the same PR in the buildkit repo?

@tianon
Copy link
Member

tianon commented Sep 9, 2024

I believe buildkit uses Moby's profile, so it should pick it up once it's in + vendoring updated 👍

@gdams
Copy link
Contributor Author

gdams commented Sep 10, 2024

@tianon do we need to wait for the next releases of moby now or can we unblock the CI?

@tianon
Copy link
Member

tianon commented Sep 10, 2024

I can backport the patch to the builds of moby we use, but it's significantly easier (and already done, as you noticed 😂) for me to backport the patch to our buildkit builds, and the timing is perfect because I think Temurin was in the next set of images moving to our newer build system that would use those buildkit builds: docker-library/meta-scripts#78 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants