fix: wait for image conversion before reporting build ready (KERNEL-863)#65
fix: wait for image conversion before reporting build ready (KERNEL-863)#65hiroTamada merged 2 commits intomainfrom
Conversation
Fixes a race condition where build status would transition to "ready" before the image conversion completed, causing instance creation to fail. The registry's triggerConversion() runs asynchronously after returning 201 to the builder. This meant the builder could report success and the build manager would set status="ready" while image conversion was still in progress. Changes: - Add imageManager dependency to build manager - Add waitForImageReady() that polls until image status is "ready" - Call waitForImageReady() before setting build to "ready" status - If image conversion fails/times out, mark build as failed
Addresses PR review feedback: 1. Use buildCtx instead of ctx for waitForImageReady to respect build timeout during image conversion wait 2. Recalculate duration after waitForImageReady completes to accurately report total build time including image conversion
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| imageMgr.images[imageRef].Status = images.StatusConverting | ||
| time.Sleep(100 * time.Millisecond) | ||
| imageMgr.images[imageRef].Status = images.StatusReady | ||
| }() |
There was a problem hiding this comment.
Test has data race when modifying image status
Low Severity
TestWaitForImageReady_WaitsForConversion has a data race: the goroutine at lines 105-110 directly writes to imageMgr.images[imageRef].Status while waitForImageReady concurrently reads the same Status field through GetImage. The mock's GetImage returns a pointer to the same Image struct without copying, so both goroutines access the same memory location without synchronization. This will cause failures when running tests with -race.
There was a problem hiding this comment.
this is fine for now. We are not running test with race.
Summary
waitForImageReady()to poll image status before marking build completeProblem
The registry's
triggerConversion()runs asynchronously after returning 201 to the builder:This caused a race where:
Solution
Build manager now waits for image to be ready before reporting build complete:
Test plan
TestWaitForImageReady_Success- image already readyTestWaitForImageReady_WaitsForConversion- polls until readyTestWaitForImageReady_ContextCancelled- respects context timeoutTestWaitForImageReady_Failed- handles failed conversionFixes KERNEL-863
Note
Addresses a race where builds reported
readybefore image conversion finished.images.Managerintobuildsmanager and wiring (ProvideBuildManager,wire_gen.go)waitForImageReady()polling and invokes it inrunBuildbefore setting status toready; uses build context for accurate metrics/timeoutbuildCtxwaitForImageReadysuccess, conversion wait, context-cancel timeout, and failed conversionWritten by Cursor Bugbot for commit 450d496. This will update automatically on new commits. Configure here.