Skip to content

Conversation

@hiroTamada
Copy link
Contributor

@hiroTamada hiroTamada commented Feb 6, 2026

Summary

Fixes two related issues with BuildKit cache handling:

  1. Global cache wasn't providing cache hits - Without image-manifest=true, BuildKit's registry cache stores layer references pointing to external registries rather than copying actual layer blobs
  2. Cache image push was failing - Hypeman's registry was attempting to convert cache images to ext4 format, but BuildKit uses a custom mediatype (application/vnd.buildkit.cacheconfig.v0) that can't be unpacked by standard OCI tools

Changes

Fix 1: Enable proper cache blob storage

  • lib/builds/builder_agent/main.go - Added image-manifest=true,oci-mediatypes=true to cache export options
  • lib/builds/cache.go - Updated ExportCacheArg() helper for consistency
  • lib/builds/cache_test.go - Updated test expectation

Fix 2: Skip conversion for cache images

  • lib/registry/registry.go - Skip triggerConversion() for cache/* repos since they're not runnable containers

Test coverage

  • lib/images/oci_test.go - Unit tests documenting the BuildKit cache mediatype limitation

Root causes

Issue 1: Without image-manifest=true, BuildKit stores layer references like:

sha256:abc123 -> docker.io/library/alpine@sha256:abc123

Instead of copying the actual blob. Ephemeral BuildKit instances can't resolve these references.

Issue 2: When cache images are pushed, the registry triggers conversion to ext4 format. This calls umoci.UnpackRootfs which expects standard OCI config mediatype but BuildKit uses:

application/vnd.buildkit.cacheconfig.v0

This caused: config blob is not correct mediatype application/vnd.oci.image.config.v1+json

Before (first tenant deployment)

#9 [1/3] FROM docker.io/onkernel/python3.11-base:0.1.1@sha256:...
#9 sha256:4831516... 28.23MB / 28.23MB 0.4s   ← Downloaded from Docker Hub
#9 extracting sha256:4831516... 2.1s done
... (many more layer downloads ~7s)

After

#9 [1/3] FROM docker.io/onkernel/python3.11-base:0.1.1@sha256:...
#9 DONE 0.0s   ← Cache hit from global cache

Test plan

  • Unit tests pass (go test ./lib/images/... ./lib/registry/... ./lib/builds/...)
  • E2E tested: pushed BuildKit cache image with custom mediatype to local server - no conversion error
  • Rebuild builder image with updated builder_agent
  • Re-run global cache population script
  • Deploy to fresh tenant and verify cache hits

Note

Medium Risk
Touches build caching behavior and registry post-push processing; incorrect cache flags or repo matching could reduce cache effectiveness or unintentionally skip conversion for some images.

Overview
Fixes BuildKit registry cache exports to work reliably in ephemeral builders by adding image-manifest=true (and oci-mediatypes=true) to cache export args for both global/admin and tenant caches.

Updates the shared CacheKey.ExportCacheArg() helper and its tests accordingly, adds lib/images/oci_test.go to document the BuildKit cache config mediatype incompatibility, and changes the registry to skip ext4 conversion for pushed cache/* images so cache pushes no longer fail.

Written by Cursor Bugbot for commit 1460df9. This will update automatically on new commits. Configure here.

Without image-manifest=true, BuildKit's registry cache stores layer
references pointing to external registries (e.g., docker.io) rather
than copying the actual layer blobs into the cache image. This causes
cache misses in ephemeral BuildKit instances (like our builder VMs)
because the layers aren't available locally.

With image-manifest=true, BuildKit creates a proper OCI image manifest
with all layer blobs stored in the registry, enabling cache hits even
in fresh BuildKit instances.

This fixes the issue where the global cache (populated by admin builds)
wasn't providing cache hits for tenant builds - the first deployment
for each tenant was re-downloading all base image layers from Docker Hub.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hiroTamada hiroTamada requested a review from rgarcia February 6, 2026 17:04
Adds a unit test that reproduces the production issue where hypeman fails
to pre-pull BuildKit cache images. The test creates a mock OCI layout with
BuildKit's cache config mediatype (application/vnd.buildkit.cacheconfig.v0)
and verifies that unpackLayers fails with the expected error.

This test documents the root cause: umoci expects standard OCI config
mediatype but BuildKit cache exports use a custom mediatype.

Co-authored-by: Cursor <cursoragent@cursor.com>
BuildKit exports cache with a custom mediatype
(application/vnd.buildkit.cacheconfig.v0) that can't be unpacked
by standard OCI tools like umoci. This caused errors when pushing
cache images to the registry:

  config blob is not correct mediatype
  application/vnd.oci.image.config.v1+json:
  application/vnd.buildkit.cacheconfig.v0

The fix skips the ext4 conversion step for cache/* repos since:
1. Cache images are not runnable containers
2. BuildKit imports them directly from the registry
3. There's no need to unpack or convert them locally

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

The repo parameter passed to triggerConversion includes the Host header
prefix (e.g., "10.102.0.1:8083/cache/global/node"). The previous check
only used HasPrefix("cache/") which would never match.

Now checks for both patterns:
- HasPrefix("cache/") for edge case without host
- Contains("/cache/") for normal case with host prefix

Co-authored-by: Cursor <cursoragent@cursor.com>
@hiroTamada hiroTamada merged commit f21c072 into main Feb 6, 2026
4 checks passed
@hiroTamada hiroTamada deleted the fix/buildkit-cache-image-manifest branch February 6, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants