fix(cli-core): wait for running stacks to complete when one fails #3934

aong-atlassian · 2025-10-06T23:03:15Z

Related issue

Description

This PR fixes a critical bug where parallel stack deployments would crash and leave infrastructure in an inconsistent state when one stack failed during deployment.

The Problem:

When running cdktf deploy --parallelism N, if a stack failed while at max parallelism:

Promise.race() would throw immediately (line 472 in cdktf-project.ts)
The execution loop would exit prematurely
Already-running Terraform child processes would be killed mid-deployment
Infrastructure would be left in an inconsistent state (partial resources created, state locks held, corrupted Terraform state files)

This bug was introduced in v0.10.0 (March 2022) and has affected all versions through v0.21.0+.

Root Cause:

The code used an unwrapped Promise.race() in the execute loop without proper error handling:

if (runningStacks.length >= maxParallelRuns) {
  await Promise.race(runningStacks.map((s) => s.currentWorkPromise));
  continue;
}

When a promise rejects, Promise.race() immediately throws. This causes the execution loop to exit and the CLI process terminates via exit(new Error(err)), killing all active Terraform child processes.

The Solution:

I wrapped Promise.race() with try-catch that breaks out of the loop instead of throwing immediately:

if (runningStacks.length >= maxParallelRuns) {
  try {
    await Promise.race(runningStacks.map((s) => s.currentWorkPromise));
  } catch (e) {
     logger.debug(
       "Encountered an error in one of the stacks, allowing running stacks to finish before exit",
       e,
     );
    break;
  }
  continue;
}

Why this approach:

I chose to use break instead of throwing because it allows execution to reach the existing ensureAllSettledBeforeThrowing call at line 507-510. This existing infrastructure properly waits for all running stacks to complete before reporting the error, ensuring:

No orphaned Terraform processes
All active deployments complete cleanly
Proper error reporting after all work is done
Clean shutdown with no inconsistent infrastructure state

In testing, it appears that the error handling for the promises is handled elsewhere, leading to duplicate errors if rethrowing the error directly in the catch block. The unit test checks for the error.

Testing:

I added a new test: "waits for running stacks to complete when one fails with limited parallelism"

Uses parallelism: 2 with 4 stacks to trigger the bug scenario
Verifies that already-running stacks complete even when one fails
The test fails without the fix and passes with it

To properly test this, I also added stack4 to the parallel-error test fixture, as the bug only manifests when:

Running at max parallelism AND
There's another pending stack waiting (forcing re-entry into the if block) AND
A running stack fails while waiting for a slot

All 25 tests pass with this change (487s runtime).

Checklist

I have updated the PR title to match CDKTF's style guide
I have run the linter on my code locally
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation if applicable (N/A - this is a bug fix, no user-facing API changes)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works if applicable
New and existing unit tests pass locally with my changes

vercel · 2025-10-06T23:03:21Z

@aong-atlassian is attempting to deploy a commit to the HashiCorp Team on Vercel.

A member of the Team first needs to authorize it.

hashicorp-cla-app · 2025-10-06T23:03:37Z

All committers have signed the CLA.

hashicorp-cla-app · 2025-10-06T23:03:37Z

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

_{Have you signed the CLA already but the status is still pending? Recheck it.}

aong-atlassian · 2025-10-06T23:18:19Z

Just waiting on my company's internal review for the CLA

Hopefully we can iterate on the fix in the meantime if needed 🙏

aong-atlassian · 2025-11-05T05:15:30Z

Hi @ansgarm @mutahhir any chance to get this looked at? 🙏

fix: handle uncaught exception causing UnhandledPromiseRejection

d037186

aong-atlassian requested a review from a team as a code owner October 6, 2025 23:03

aong-atlassian requested review from ansgarm and mutahhir and removed request for a team October 6, 2025 23:03

aong-atlassian mentioned this pull request Oct 6, 2025

One stack failure when deploying multiple stack in parallel causes other stacks state to remain locked #3206

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli-core): wait for running stacks to complete when one fails #3934

fix(cli-core): wait for running stacks to complete when one fails #3934

Uh oh!

aong-atlassian commented Oct 6, 2025

Uh oh!

vercel bot commented Oct 6, 2025

Uh oh!

hashicorp-cla-app bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

hashicorp-cla-app bot commented Oct 6, 2025

Uh oh!

aong-atlassian commented Oct 6, 2025

Uh oh!

aong-atlassian commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(cli-core): wait for running stacks to complete when one fails #3934

Are you sure you want to change the base?

fix(cli-core): wait for running stacks to complete when one fails #3934

Uh oh!

Conversation

aong-atlassian commented Oct 6, 2025

Related issue

Description

Checklist

Uh oh!

vercel bot commented Oct 6, 2025

Uh oh!

hashicorp-cla-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashicorp-cla-app bot commented Oct 6, 2025

Uh oh!

aong-atlassian commented Oct 6, 2025

Uh oh!

aong-atlassian commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hashicorp-cla-app bot commented Oct 6, 2025 •

edited

Loading