Skip to content

Conversation

@zzak
Copy link
Member

@zzak zzak commented Feb 14, 2025

This extracts #141 save for switching the queue to use Buildkite hosted agents.

@rafaelfranca
Copy link
Member

Can we clean the commits in this PR. There is a lot of back and forth with the docker-compose version and no explanation why. Otherwise the code looks good

@zzak zzak force-pushed the buildkit-cluster-secrets branch 3 times, most recently from 251ffc4 to b5365d2 Compare October 21, 2025 13:33
James Healy and others added 15 commits October 21, 2025 22:40
An experiment in changing the rails CI pipeline from "self-hosted"
agents to "hosted" agents, a recently release Buildkite feature [1].

The hosted agents linux environment is superficially quite similar to
the Elastic Stack for AWS, so the required changes are fairly minimal.
Roughly half the changes are to take advantage of some performance
optimisations available on hosted agents (like cache volumes, and
remote buildkit builders with cache that last across builds).

The essential changes:

* Read the OCI registry from the environment rather than hard code an
  ECR registry. The current self-hosted agents run in AWS and can access
  ECR, but the hosted agent environment has access to its own registry
  specifically for use cases like this - building an image at the start
  of the build and then reusing it in later jobs
* Changing the queue from `default` or `builder`, to `hosted`

Optimisations:

* There's no need to use the docker-compose plugins cache_from and
  image_name shenanigans. The images built at the start of each build
  use a remote buildkit builder with cache that is s hared between
  builds. The cache is typically warm, and when it is the image build
  time drops from ~2 mins to ~18sec
* Use plain buildkit to build the images, without the docker compose
  plugin. This avoids the image being exported from buildkit to docker,
  and when the buildkit cache is warm the jobs complete in as little as
  18s. This bypasses the docker-compse built in support for separating
  building and running, but the docker-compose.yml already kinda
  bypasses that by hard coding the image used in the run jobs (using the
  IMAGE_NAME env var)
* ~Create a cache volume for ruby gems that are installed in docker
  during the initial step. This shaves ~30s off the build time~

[1] https://buildkite.com/docs/pipelines/hosted-agents/overview
This should allow to see, for example, the expected image tag being
built to carry over.

```diff
-    - docker-compose#v4.16.0:
+    - docker-compose#v5.0.0:
         build: base
         config: ".buildkite/docker-compose.yml"
         env:
         - PRE_STEPS
         - RACK
-        image-name: ruby-3-4-build_id
         cache-from:
         - base:973266071021.dkr.ecr.us-east-1.amazonaws.com/builds:ruby-3-4-br-main
         push:
         - base:973266071021.dkr.ecr.us-east-1.amazonaws.com/builds:ruby-3-4-br-
-        image-repository: 973266071021.dkr.ecr.us-east-1.amazonaws.com/builds
```

Notice how the tag is only `ruby-3-4-br-` because the build id was
missing from the environment when generating the pipeline.
@zzak zzak force-pushed the buildkit-cluster-secrets branch from b5365d2 to 89ddf20 Compare October 21, 2025 13:41
@zzak
Copy link
Member Author

zzak commented Oct 21, 2025

Thanks for the review. 🙇 Cleaned up git and rebased.

This version still uses self-hosted but with placeholders to use the Buildkite hosted infra.

Tested using:

bin/trigger-pipeline --fork zzak --config_branch "buildkit-cluster-secrets" rails rails-ci

I'm not sure the hosted finished, it was queued and I went to bed 😂 Will check back on this later.

@rafaelfranca rafaelfranca merged commit 680bd04 into rails:main Oct 23, 2025
1 check was pending
@rafaelfranca
Copy link
Member

I merged this but just realized that the hosted did not work at all.

See

/bin/bash: line 28: docker: command not found
--
  | Generating pipeline:
  | sh: 1: docker: not found

@zzak
Copy link
Member Author

zzak commented Oct 23, 2025

Yeah sorry, I'm making time today to work on this. 🙏

@zzak zzak deleted the buildkit-cluster-secrets branch October 23, 2025 22:50
@p8
Copy link
Member

p8 commented Oct 24, 2025

This seems to break CI for rails/rails. Could it be reverted until a fix is made?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants