Skip to content

feat(cloudimg): parallel Range download for image pull#76

Merged
CMGS merged 3 commits into
masterfrom
feat/cloudimg-range-pull
Jul 3, 2026
Merged

feat(cloudimg): parallel Range download for image pull#76
CMGS merged 3 commits into
masterfrom
feat/cloudimg-range-pull

Conversation

@CMGS

@CMGS CMGS commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What

Cloud-image pull was a single serial GET — the whole multi-GB download rode one TCP stream. Now:

  • Probe Range: bytes=0-0206 + parseable Content-Range total → split the blob across PullConns (default 8) concurrent Range connections into a preallocated temp file (io.NewOffsetWriter, disjoint offsets, no locking); any range failure aborts the whole download (fail-fast via utils.Map)
  • Fallback: anything else (200, no usable size) → the existing serial stream, behavior unchanged
  • Post-download sha256 re-read from disk (parallel writes can't feed a streaming hasher — this also verifies what actually landed)
  • Range requests reuse the probe's post-redirect URL, so presigned/CDN redirects keep working
  • Per-range LimitReader guards against a server over-serving a range (would otherwise overwrite the next region)
  • New knob: pull_conns (config/json), EffectivePullConns(); cloudimg.New(ctx, rootDir, pullConns) mirrors oci.New's poolSize shape; the <=0 → default rule now has a single owner (utils.OrDefault, PoolSizeOrDefault delegates)
  • Progress: one mutex-guarded progressCounter serves both paths (old progressWriter deleted); same 1 MiB cadence and event shape, emit outside the lock

Trust boundary note: like the serial path, we don't verify the server's Content-Range offset per range — the transport enforces length, and the digest is computed from disk.

Verification

  • 7 unit tests incl. uneven-last-range (clamp branch) and probe-redirect (front server sees exactly 1 request)
  • make lint 0 issues (linux+darwin), full suite 25 pkgs green, fmt-check clean
  • Real-network e2e on bare-metal testbed: ubuntu-24.04-minimal cloudimg (~300 MB) pulled with pull_conns=8 and pull_conns=1identical digest (bd9023e71b7bb5bd…), 5.9s vs 8.3s on an already-fast mirror (throttled single-stream sources like ghcr are the real win, per the sibling cocoon-macos data)

@CMGS

CMGS commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Review follow-up applied: range workers no longer accept a short/mislabeled 206 — Content-Range must match the requested span exactly, and io.Copy count must equal end-start+1 (a chunked early-EOF previously produced a zero-holed file that hashDigest would bless with a "valid" digest, poisoning the blob cache). Two regression tests added (ShortRangeBody, MismatchedContentRange), both negative-verified: FAIL without the fix, pass with it. lint 0×2, 25 pkgs green.

@CMGS CMGS force-pushed the feat/cloudimg-range-pull branch from 1675477 to 42dd948 Compare July 3, 2026 14:23
CMGS added 2 commits July 3, 2026 22:35
Single-stream GET was the bottleneck for multi-GB cloud images. Probe
Range support, then split the blob across PullConns concurrent Range
connections into a preallocated file, re-hashing on disk afterwards;
fall back to the existing serial stream when the server lacks Range.
reseed verb, exec WaitDelay session-leak fix, machine-id setup fallback.
@CMGS

CMGS commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Full review pass completed (reuse/altitude lens + comment tightening): probeRangeSupport 4-tuple collapsed to (*rangeProbe, error) (nil = fall back — invalid field combos now unrepresentable); two comments tightened (the probe godoc claimed Go drops Range across redirects — it doesn't; the real point is skipping the redirect hop per range). Per-range retry via utils.DoWithRetry was considered and deliberately rejected: retrying a range would double-count progress and need offset-rewrite machinery — the parallel path stays fail-fast; a failed pull is cheap to re-run and idempotent by digest.

Also folded in: chore(os-image) commit bumping cocoon-agent pins to v0.1.6 (ubuntu install-agent.sh both arches + android 14.0/15.0 Dockerfiles, real checksums from the release). lint 0×2, 25 pkgs green.

@CMGS CMGS force-pushed the feat/cloudimg-range-pull branch from 42dd948 to 8800f91 Compare July 3, 2026 14:35
@CMGS CMGS merged commit 5461099 into master Jul 3, 2026
4 checks passed
@CMGS CMGS deleted the feat/cloudimg-range-pull branch July 3, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant