fix(cli-core): wait for running stacks to complete when one fails #3934
+67
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related issue
Fixes #3206
Description
This PR fixes a critical bug where parallel stack deployments would crash and leave infrastructure in an inconsistent state when one stack failed during deployment.
The Problem:
When running
cdktf deploy --parallelism N
, if a stack failed while at max parallelism:Promise.race()
would throw immediately (line 472 in cdktf-project.ts)This bug was introduced in v0.10.0 (March 2022) and has affected all versions through v0.21.0+.
Root Cause:
The code used an unwrapped
Promise.race()
in the execute loop without proper error handling:When a promise rejects,
Promise.race()
immediately throws. This causes the execution loop to exit and the CLI process terminates viaexit(new Error(err))
, killing all active Terraform child processes.The Solution:
I wrapped
Promise.race()
with try-catch that breaks out of the loop instead of throwing immediately:Why this approach:
I chose to use
break
instead of throwing because it allows execution to reach the existingensureAllSettledBeforeThrowing
call at line 507-510. This existing infrastructure properly waits for all running stacks to complete before reporting the error, ensuring:In testing, it appears that the error handling for the promises is handled elsewhere, leading to duplicate errors if rethrowing the error directly in the catch block. The unit test checks for the error.
Testing:
I added a new test: "waits for running stacks to complete when one fails with limited parallelism"
parallelism: 2
with 4 stacks to trigger the bug scenarioTo properly test this, I also added
stack4
to theparallel-error
test fixture, as the bug only manifests when:if
block) ANDAll 25 tests pass with this change (487s runtime).
Checklist