feat(cloudimg): parallel Range download for image pull#76
Conversation
|
Review follow-up applied: range workers no longer accept a short/mislabeled 206 — |
1675477 to
42dd948
Compare
Single-stream GET was the bottleneck for multi-GB cloud images. Probe Range support, then split the blob across PullConns concurrent Range connections into a preallocated file, re-hashing on disk afterwards; fall back to the existing serial stream when the server lacks Range.
reseed verb, exec WaitDelay session-leak fix, machine-id setup fallback.
|
Full review pass completed (reuse/altitude lens + comment tightening): Also folded in: |
42dd948 to
8800f91
Compare
What
Cloud-image pull was a single serial GET — the whole multi-GB download rode one TCP stream. Now:
Range: bytes=0-0→206+ parseableContent-Rangetotal → split the blob across PullConns (default 8) concurrent Range connections into a preallocated temp file (io.NewOffsetWriter, disjoint offsets, no locking); any range failure aborts the whole download (fail-fast viautils.Map)LimitReaderguards against a server over-serving a range (would otherwise overwrite the next region)pull_conns(config/json),EffectivePullConns();cloudimg.New(ctx, rootDir, pullConns)mirrorsoci.New's poolSize shape; the<=0 → defaultrule now has a single owner (utils.OrDefault,PoolSizeOrDefaultdelegates)progressCounterserves both paths (oldprogressWriterdeleted); same 1 MiB cadence and event shape, emit outside the lockTrust boundary note: like the serial path, we don't verify the server's
Content-Rangeoffset per range — the transport enforces length, and the digest is computed from disk.Verification
make lint0 issues (linux+darwin), full suite 25 pkgs green, fmt-check cleanpull_conns=8andpull_conns=1→ identical digest (bd9023e71b7bb5bd…), 5.9s vs 8.3s on an already-fast mirror (throttled single-stream sources like ghcr are the real win, per the sibling cocoon-macos data)