fix: add retry logic for planning-to-coding transition (#495) #1276

kaigler · 2026-01-18T01:09:15Z

Summary

Fixes race condition where tasks get stuck after planning phase completes
Root cause: get_next_subtask() may return None briefly due to file I/O timing
Solution: Add retry logic with exponential backoff when transitioning from planning to coding

Changes

Added just_transitioned_from_planning flag in apps/backend/agents/coder.py
Added retry loop (3 attempts, 2s/4s/6s delays) when no subtask found after planning
Updates subtask_id and phase_name after successful retry

Test Plan

Tested via CLI: python run.py --spec 002 --force --auto-continue
Tested via Electron frontend: Started task from UI
Both successfully transitioned from planning to coding
Subtasks completed without getting stuck
All 1575 backend tests pass locally

Related Issues

Fixes Task execution stops after planning phase despite approval #495
Related to Won't start working on a task #457, Agent never gets subtasks, so system doesn't operate #480 (similar symptoms)
PR fix: add missing status transition from approval to execution #496 was closed without merge (different approach)

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved agent stability with enhanced handling during the planning-to-coding phase transition, including automatic retry logic to gracefully manage timing delays when tasks are delayed.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

The coder agent could get stuck after planning completes because get_next_subtask() may return None briefly due to file I/O timing. - Add just_transitioned_from_planning flag to detect transition - Retry with exponential backoff (2s, 4s, 6s) after planning - Update subtask_id and phase_name after successful retry Fixes AndyMik90#495

coderabbitai · 2026-01-18T01:09:26Z

📝 Walkthrough

Walkthrough

Introduces a just_transitioned_from_planning flag to track planning-to-coding phase transitions in the coder agent. Adds a retry mechanism with exponential backoff (2s, 4s, 6s) that re-checks for pending subtasks up to three times when no next_subtask is available immediately after transition, handling potential race conditions during phase change.

Changes

Cohort / File(s)	Summary
Retry logic for planning-to-coding transition `apps/backend/agents/coder.py`	Added `just_transitioned_from_planning` flag to gate race-condition handling. Implements exponential backoff retry mechanism (3 attempts with 2s, 4s, 6s delays) to wait for next_subtask availability after planning phase completes. Falls back to original flow if no subtasks are found after retries.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

bug, size/S

Suggested reviewers

AlexMadera

Poem

🐰 A race condition sparked a fix so neat,
With flags and retries, the logic's complete!
Planning to coding, a transition so grand,
Now subtasks await as we carefully planned! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements retry logic for a race condition during planning-to-coding transition, but the linked issue `#495` requires explicit status transition from 'human_review' to 'in_progress' with approval validation, which is not present in the implementation.	Implement the explicit status transition logic checking plan.status == 'human_review' and plan.planStatus == 'review', then validate approval and transition to 'in_progress' before retrying for subtasks.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: adding retry logic for the planning-to-coding transition, which matches the core implementation of the retry mechanism with exponential backoff.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing retry logic for the planning-to-coding transition, which addresses the race condition mentioned in issue `#495`.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-18T01:09:28Z

Summary of Changes

Hello @kaigler, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the stability of the system by resolving a critical race condition that could halt task progression. By implementing a strategic retry mechanism during the transition from planning to coding, it ensures that the system reliably identifies and processes subsequent subtasks, preventing tasks from becoming unresponsive and improving overall workflow continuity.

Highlights

Race Condition Fix: Addresses a race condition where tasks could get stuck after the planning phase due to get_next_subtask() briefly returning None.
Retry Logic Implementation: Introduces robust retry logic with exponential backoff (3 attempts, 2s/4s/6s delays) when transitioning from planning to coding, ensuring subtasks are properly picked up.
State Management: A new just_transitioned_from_planning flag is used to specifically trigger the retry mechanism only when immediately following the planning phase.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions

🎉 Thanks for your first PR!

A maintainer will review it soon. Please make sure:

Your branch is synced with develop
CI checks pass
You've followed our contribution guide

Welcome to the Auto Claude community!

sentry · 2026-01-18T01:10:25Z

Codecov Report

❌ Patch coverage is 11.76471% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
apps/backend/agents/coder.py	11.76%	15 Missing ⚠️

📢 Thoughts on this report? Let us know!

apps/backend/agents/coder.py

+                        if next_subtask:
+                            # Update subtask_id and phase_name after successful retry
+                            subtask_id = next_subtask.get("id")
+                            phase_name = next_subtask.get("phase_name")


gemini-code-assist

Code Review

This pull request introduces a retry mechanism to address a race condition that occurs when transitioning from the planning to the coding phase. The changes look solid and directly address the issue described. The use of a flag to detect the transition is a good approach.

My review includes a suggestion to align the backoff implementation with the 'exponential backoff' mentioned in the comments and PR description, as the current implementation is linear. I've also recommended extracting hardcoded values for retries and delays into constants to improve code maintainability.

Overall, this is a valuable fix for a tricky timing issue.

gemini-code-assist · 2026-01-18T01:11:05Z

apps/backend/agents/coder.py

+                    for retry_attempt in range(3):
+                        delay = (retry_attempt + 1) * 2  # 2s, 4s, 6s


This retry logic is a good improvement. To make it even better, I have two suggestions:

Exponential vs. Linear Backoff: The comment on line 346 indicates 'exponential backoff', but the delay calculation on line 352 is linear. It would be better to align the implementation with the comment by using an actual exponential backoff.

Magic Numbers: The retry count 3 and the delay base 2 are hardcoded. Extracting these into named constants (e.g., PLANNING_TRANSITION_RETRIES, RETRY_DELAY_BASE_SECONDS) at a higher scope would improve readability and maintainability.

Here's a suggestion that implements exponential backoff. The constants for retry count and delay base can be defined elsewhere.

Suggested change

for retry_attempt in range(3):

delay = (retry_attempt + 1) * 2 # 2s, 4s, 6s

for retry_attempt in range(3):

delay = 2 ** (retry_attempt + 1) # Exponential backoff: 2s, 4s, 8s

AndyMik90

✅ Auto Claude Review - APPROVED

Status: Ready to Merge

Summary: ### Merge Verdict: ✅ READY TO MERGE

✅ Ready to merge - All checks passing, no blocking issues found.

No blocking issues. 4 non-blocking suggestion(s) to consider

Risk Assessment

Factor	Level	Notes
Complexity	Low	Based on lines changed
Security Impact	None	Based on security findings
Scope Coherence	Good	Based on structural review

Findings Summary

Low: 4 issue(s)

Generated by Auto Claude PR Review

💡 Suggestions (4)

These are non-blocking suggestions for consideration:

🔵 [9a3b10490c71] [LOW] Comment says 'exponential backoff' but implementation is linear

📁 apps/backend/agents/coder.py:346

The comment on line 346 states 'Retry with exponential backoff' but the implementation delay = (retry_attempt + 1) * 2 produces 2s, 4s, 6s (linear progression). True exponential backoff would be 2 ** (retry_attempt + 1) producing 2s, 4s, 8s. This is a documentation accuracy issue - the actual delays work fine for the use case.

Suggested fix:

Either update comment to 'Retry with linear backoff before giving up.' or change formula to `delay = 2 ** (retry_attempt + 1)` for true exponential (2s, 4s, 8s).

🔵 [75ceb8034f37] [LOW] Success message reports single-iteration delay instead of cumulative wait time

📁 apps/backend/agents/coder.py:359

When a subtask is found after retry, the message reports f'Found subtask {subtask_id} after {delay}s delay'. However, delay is only the current iteration's delay, not cumulative time. For example, if found on retry 2 (after sleeping 2s then 4s), message says '4s delay' when actual wait was 6s total. Minor but could cause debugging confusion.

Suggested fix:

Track cumulative delay: `total_delay = 0` before loop, `total_delay += delay` after each sleep, then report `f'Found subtask {subtask_id} after {total_delay}s total delay'`

🔵 [c41e1c93e801] [LOW] Consider extracting retry configuration as named constants

📁 apps/backend/agents/coder.py:351

The retry count (3) and base delay (2) are hardcoded inline. The codebase has patterns for such constants (e.g., MAX_RETRIES in spec/phases/models.py, AUTO_CONTINUE_DELAY_SECONDS in agents/base.py). Extracting these would improve discoverability and make tuning easier. The inline comment # 2s, 4s, 6s documents the behavior adequately for now.

Suggested fix:

Add constants to agents/base.py: `PLAN_READY_MAX_RETRIES = 3` and `PLAN_READY_BASE_DELAY_SECONDS = 2`, then use them in the loop.

🔵 [381534ff5b73] [LOW] AI Triage: GitHub Advanced Security 'variable defined multiple times' is FALSE POSITIVE

📁 apps/backend/agents/coder.py:358

GitHub Advanced Security flagged line 358 phase_name = next_subtask.get("phase_name") as unnecessary. This is incorrect - the initial assignment on line 248 IS used on line 252 (if phase_name:) and line 269 (print_session_header). The reassignment on line 358 only occurs in the specific retry-success branch. Both assignments serve distinct purposes in different code paths.

Suggested fix:

No fix needed - this is a false positive from the static analysis tool.

This automated review found no blocking issues. The PR can be safely merged.

Generated by Auto Claude

github-actions bot reviewed Jan 18, 2026

View reviewed changes

github-advanced-security bot found potential problems Jan 18, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 18, 2026

View reviewed changes

kaigler mentioned this pull request Jan 18, 2026

Issues with Timeout #877

Open

1 task

AndyMik90 self-assigned this Jan 18, 2026

AndyMik90 approved these changes Jan 18, 2026

View reviewed changes

AndyMik90 merged commit b865590 into AndyMik90:develop Jan 18, 2026
26 checks passed

kaigler deleted the fix/495-planning-to-coding-race-condition branch January 18, 2026 15:35

This was referenced Jan 19, 2026

fix: Human review checkpoint using stopped status (Issue #1231) #1334

Open

Fix human_review status bugs (#1149, #509) #1425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add retry logic for planning-to-coding transition (#495) #1276

fix: add retry logic for planning-to-coding transition (#495) #1276

Uh oh!

kaigler commented Jan 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Jan 18, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

sentry bot commented Jan 18, 2026

Uh oh!

Check warning

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 18, 2026

Uh oh!

AndyMik90 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for retry_attempt in range(3):
		delay = (retry_attempt + 1) * 2 # 2s, 4s, 6s

Uh oh!

fix: add retry logic for planning-to-coding transition (#495) #1276

fix: add retry logic for planning-to-coding transition (#495) #1276

Uh oh!

Conversation

kaigler commented Jan 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Related Issues

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Jan 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

sentry bot commented Jan 18, 2026

Codecov Report

Uh oh!

Check warning

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

AndyMik90 left a comment

Choose a reason for hiding this comment

✅ Auto Claude Review - APPROVED

Risk Assessment

Findings Summary

💡 Suggestions (4)

🔵 [9a3b10490c71] [LOW] Comment says 'exponential backoff' but implementation is linear

🔵 [75ceb8034f37] [LOW] Success message reports single-iteration delay instead of cumulative wait time

🔵 [c41e1c93e801] [LOW] Consider extracting retry configuration as named constants

🔵 [381534ff5b73] [LOW] AI Triage: GitHub Advanced Security 'variable defined multiple times' is FALSE POSITIVE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaigler commented Jan 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 18, 2026 •

edited

Loading