Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
20a30c1
fix: normalize Windows CLI executable paths with missing extensions
StillKnotKnown Jan 19, 2026
322ca16
test: add fs mocking tests for normalizeExecutablePath
StillKnotKnown Jan 19, 2026
8d8211e
test: use explicit mock default instead of delegating to mocked fs
StillKnotKnown Jan 19, 2026
08510d3
test: use describeUnix for Unix-generic behavior instead of describeM…
StillKnotKnown Jan 19, 2026
059cb4d
fix: address multiple root causes of Windows CLI validation failures
StillKnotKnown Jan 19, 2026
86468ac
fix: Windows CLI path validation - consolidate platform abstraction a…
StillKnotKnown Jan 19, 2026
a938921
refactor: use platform abstraction in high-priority files
StillKnotKnown Jan 19, 2026
111c9ff
refactor: use platform abstraction in MEDIUM/HIGH priority files
StillKnotKnown Jan 19, 2026
02ff75e
refactor: complete platform abstraction for remaining files
StillKnotKnown Jan 19, 2026
8aa42a3
test: add tests for new platform utility functions
StillKnotKnown Jan 19, 2026
b513d97
fix: platform abstraction improvements and CodeRabbit feedback
StillKnotKnown Jan 19, 2026
3d8cb65
test: fix getVenvPythonPath test for cross-platform compatibility
StillKnotKnown Jan 19, 2026
52ebd6a
fix: replace direct platform checks with platform abstraction
StillKnotKnown Jan 19, 2026
c4641a5
fix: replace direct platform checks in index.ts and improve GitHub CL…
StillKnotKnown Jan 19, 2026
4b1d2e0
test: add comprehensive test suite for platform/paths.ts
StillKnotKnown Jan 19, 2026
67ea01b
test: increase timeout for flaky Windows CI test
StillKnotKnown Jan 19, 2026
518c9f5
fix: replace remaining direct process.platform checks with platform a…
StillKnotKnown Jan 19, 2026
3c741ff
fix: use expandWindowsEnvVars for Windows path detection in worktree-…
StillKnotKnown Jan 19, 2026
6bf1776
fix: add proper environment variable fallbacks for Windows terminal p…
StillKnotKnown Jan 19, 2026
945a7d5
refactor: consolidate duplicate expandWindowsEnvVars implementations
StillKnotKnown Jan 19, 2026
7644dd1
test: add comprehensive test suite for windows-paths.ts
StillKnotKnown Jan 19, 2026
d91bb44
test: add pty-manager.test.ts with 20 passing tests
StillKnotKnown Jan 19, 2026
4d557db
refactor: use environment variable expansion in COMMON_BIN_PATHS
StillKnotKnown Jan 19, 2026
22b9f11
test: fix windows-paths.test.ts - 31/35 tests now passing (89%)
StillKnotKnown Jan 19, 2026
2d1ef25
test: fix lint and TypeScript errors in test files
StillKnotKnown Jan 19, 2026
6734179
lint: fix pre-commit hook failures
StillKnotKnown Jan 19, 2026
6824658
test: fix duplicate beforeEach hook in windows-paths.test.ts
StillKnotKnown Jan 19, 2026
27f44f0
Merge branch 'develop' into windows-user-path-validation
StillKnotKnown Jan 19, 2026
8114f62
test: address CodeRabbit feedback - fake timers, async tests, path no…
StillKnotKnown Jan 19, 2026
a7b5b95
refactor: remove unused imports and variables (CodeQL alerts)
StillKnotKnown Jan 19, 2026
d3a9f17
docs: map existing codebase
Jan 19, 2026
161a9dc
docs: initialize project
Jan 19, 2026
62b80e0
chore: add project config
Jan 19, 2026
b2de1bf
Merge branch 'develop' into windows-user-path-validation
StillKnotKnown Jan 19, 2026
c3ad052
Merge branch 'develop' into windows-user-path-validation
StillKnotKnown Jan 19, 2026
26c50f0
Merge branch 'develop' into windows-user-path-validation
StillKnotKnown Jan 19, 2026
9ee39af
fix: comprehensive platform abstraction alignment for CLI path handling
StillKnotKnown Jan 19, 2026
f2ef1bb
fix: align remaining code with centralized platform abstraction
StillKnotKnown Jan 19, 2026
af5653f
fix: comprehensive platform abstraction for CLI tools and terminals
StillKnotKnown Jan 19, 2026
b67d69e
fix: resolve CodeQL and Biome linter issues
StillKnotKnown Jan 19, 2026
a5e8983
test: fix duplicate beforeEach hook linter warnings
StillKnotKnown Jan 19, 2026
a3a1520
fix: return normalized executable paths from CLI tool validation
StillKnotKnown Jan 19, 2026
7c621a2
fix: critical security and platform compatibility fixes
StillKnotKnown Jan 19, 2026
8bbd204
test: fix platform-specific test failures for cross-platform CI
StillKnotKnown Jan 19, 2026
604ae7a
test: fix biome-ignore comment syntax in platform.test.ts
StillKnotKnown Jan 19, 2026
e781c3f
fix: i18n support for error messages and linter fixes
StillKnotKnown Jan 19, 2026
a72846e
fix: comprehensive platform alignment and test coverage
StillKnotKnown Jan 19, 2026
bed2c01
fix: comprehensive CLI tool detection and platform alignment
StillKnotKnown Jan 19, 2026
88f61ff
fix: address CodeQL alerts and CI test failures
StillKnotKnown Jan 20, 2026
f774459
fix: address CodeRabbit feedback for platform abstraction and test qu…
StillKnotKnown Jan 20, 2026
6d3fce7
refactor: consolidate platform abstraction and remove duplicate helpers
StillKnotKnown Jan 20, 2026
f7a95e0
test: fix platform module mocks for getEnvVar and isSecurePath
StillKnotKnown Jan 20, 2026
022401e
test: update test files to use platform abstraction functions
StillKnotKnown Jan 20, 2026
ef2033d
test: fix platform mocks for debug-logger and remove unused import
StillKnotKnown Jan 20, 2026
5bca79f
test: add comprehensive test coverage for python-detector and path ed…
StillKnotKnown Jan 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# PR Review System Robustness

## What This Is

Improvements to Auto Claude's PR review system to make it trustworthy enough to replace human review. The system uses specialist agents (security, logic, quality, codebase-fit) with a finding-validator that re-investigates findings before presenting them. This milestone fixes gaps that cause false positives and missed context.

## Core Value

**When the system flags something, it's a real issue.** Trustworthy PR reviews that are faster, more thorough, and more accurate than human review.

## Requirements

### Validated

- ✓ Multi-agent PR review architecture — existing
- ✓ Specialist agents (security, logic, quality, codebase-fit) — existing
- ✓ Finding-validator for follow-up reviews — existing
- ✓ Dismissal tracking with reasons — existing
- ✓ CI status enforcement — existing
- ✓ Context gathering (diff, comments, related files) — existing

### Active

- [ ] **REQ-001**: Finding-validator runs on initial reviews (not just follow-ups)
- [ ] **REQ-002**: Fix line 1288 bug — include ai_reviews in follow-up context
- [ ] **REQ-003**: Fetch formal PR reviews from `/pulls/{pr}/reviews` API
- [ ] **REQ-004**: Add Read/Grep/Glob tool instructions to all specialist prompts
- [ ] **REQ-005**: Expand JS/TS import analysis (path aliases, CommonJS, re-exports)
- [ ] **REQ-006**: Add Python import analysis (currently skipped)
- [ ] **REQ-007**: Increase related files limit from 20 to 50 with prioritization
- [ ] **REQ-008**: Add reverse dependency analysis (what imports changed files)

### Out of Scope

- Real-time review streaming — complexity, not needed for accuracy goal
- Review caching/memoization — premature optimization
- Custom specialist agents — current four dimensions sufficient

## Context

**Problem**: False positives in PR reviews erode trust. Users have to second-guess every finding, defeating the purpose of automated review.

**Root cause**: Finding-validator (which catches false positives) only runs during follow-up reviews. Initial reviews present unvalidated findings. Additionally, context gathering has bugs and gaps that cause the AI to make claims without complete information.

**Existing system**:
- `apps/backend/runners/github/` — PR review orchestration
- `apps/backend/runners/github/services/parallel_orchestrator_reviewer.py` — initial review
- `apps/backend/runners/github/services/parallel_followup_reviewer.py` — follow-up review (has finding-validator)
- `apps/backend/runners/github/context_gatherer.py` — gathers PR context
- `apps/backend/prompts/github/pr_*.md` — specialist agent prompts

**Reference**: Full PRD at `docs/PR_REVIEW_SYSTEM_IMPROVEMENTS.md`

## Constraints

- **Existing architecture**: Work within current multi-agent PR review structure
- **Backward compatibility**: Don't break existing review workflows
- **Performance**: Validation step should not significantly slow reviews (run in parallel where possible)

## Key Decisions

| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Add finding-validator to initial reviews | Catches false positives before user sees them | — Pending |
| Same validator for initial and follow-up | Consistency, proven approach from follow-up reviews | — Pending |
| Expand import analysis incrementally | JS/TS first (REQ-005), Python second (REQ-006) | — Pending |

---
*Last updated: 2026-01-19 after initialization*
193 changes: 193 additions & 0 deletions .planning/codebase/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Architecture

**Analysis Date:** 2026-01-19

## Pattern Overview

**Overall:** Multi-Agent Orchestration with Electron Desktop UI

**Key Characteristics:**
- Dual-app architecture: Python backend (CLI + agents) + Electron frontend (desktop UI)
- Agent-based autonomous coding via Claude Agent SDK
- Git worktree isolation for safe parallel development
- Phase-based pipeline execution for spec creation and implementation
- Event-driven IPC communication between frontend and backend

## Layers

**Frontend (Electron Main Process):**
- Purpose: Desktop application shell, native OS integration, IPC coordination
- Location: `apps/frontend/src/main/`
- Contains: Window management, IPC handlers, service managers (terminal, python env, CLI tools)
- Depends on: Backend Python CLI, Claude Code CLI
- Used by: Renderer process via IPC

**Frontend (Renderer Process):**
- Purpose: React-based user interface
- Location: `apps/frontend/src/renderer/`
- Contains: Components, Zustand stores, hooks, contexts
- Depends on: Main process via preload IPC bridge
- Used by: End users

**Backend Core:**
- Purpose: Authentication, SDK client factory, security, workspace management
- Location: `apps/backend/core/`
- Contains: `client.py` (SDK factory), `auth.py`, `worktree.py`, `workspace.py`, security hooks
- Depends on: Claude Agent SDK, project analyzer
- Used by: Agents, CLI commands, runners

**Backend Agents:**
- Purpose: AI agent implementations for autonomous coding
- Location: `apps/backend/agents/`
- Contains: Coder, planner, memory manager, session management
- Depends on: Core client, prompts, phase config
- Used by: CLI commands, QA loop

**Backend QA:**
- Purpose: Quality assurance validation loop
- Location: `apps/backend/qa/`
- Contains: QA reviewer, QA fixer, criteria validation, issue tracking
- Depends on: Agents, core client
- Used by: CLI commands after build completion

**Backend Spec:**
- Purpose: Spec creation pipeline with complexity-based phases
- Location: `apps/backend/spec/`
- Contains: Pipeline orchestrator, complexity assessment, validation
- Depends on: Core client, agents
- Used by: CLI spec commands, frontend task creation

**Backend Security:**
- Purpose: Command validation, allowlist management, secrets scanning
- Location: `apps/backend/security/`
- Contains: Validators, hooks, command parser, secrets scanner
- Depends on: Project analyzer
- Used by: Core client via pre-tool-use hooks

**Backend CLI:**
- Purpose: Command-line interface and argument routing
- Location: `apps/backend/cli/`
- Contains: Main entry, build/spec/workspace/QA commands
- Depends on: All backend modules
- Used by: Entry point (`run.py`), frontend terminal

## Data Flow

**Spec Creation Flow:**
1. User creates task via frontend or CLI (`--task "description"`)
2. `SpecOrchestrator` (`spec/pipeline/orchestrator.py`) initializes
3. Complexity assessment determines phase count (3-8 phases)
4. `AgentRunner` executes phases: Discovery -> Requirements -> [Research] -> Context -> Spec -> Plan -> Validate
5. Each phase uses Claude Agent SDK session with phase-specific prompts
6. Output: `spec.md`, `requirements.json`, `context.json`, `implementation_plan.json`

**Implementation Flow:**
1. CLI starts with `python run.py --spec 001`
2. `run_autonomous_agent()` in `agents/coder.py` orchestrates
3. Planner agent creates subtask-based `implementation_plan.json`
4. Coder agent implements subtasks in iteration loop
5. Each subtask runs as Claude Agent SDK session
6. On completion, QA validation loop runs (`qa/loop.py`)
7. QA reviewer validates -> QA fixer fixes issues -> loop until approved

**Frontend-Backend IPC Flow:**
1. Renderer component dispatches action (e.g., start task)
2. Zustand store calls `window.api.invoke('ipc-channel', args)`
3. Preload script bridges to main process
4. IPC handler in `ipc-handlers/` processes request
5. Handler spawns Python subprocess or manages terminal
6. Events streamed back via IPC to update stores

**State Management:**
- Frontend: Zustand stores per domain (`task-store`, `project-store`, `settings-store`, etc.)
- Backend: File-based state (`implementation_plan.json`, `qa_report.md`)
- Session recovery: `RecoveryManager` tracks agent sessions for resumption

## Key Abstractions

**ClaudeSDKClient:**
- Purpose: Configured Claude Agent SDK client with security hooks
- Examples: `apps/backend/core/client.py:create_client()`
- Pattern: Factory function with multi-layered security (sandbox, permissions, hooks)

**SpecOrchestrator:**
- Purpose: Coordinates spec creation pipeline phases
- Examples: `apps/backend/spec/pipeline/orchestrator.py`
- Pattern: Orchestrator with dynamic phase selection based on complexity

**WorktreeManager:**
- Purpose: Git worktree isolation for safe parallel builds
- Examples: `apps/backend/core/worktree.py`
- Pattern: Each spec gets isolated worktree branch (`auto-claude/{spec-name}`)

**SecurityProfile:**
- Purpose: Dynamic command allowlist based on project analysis
- Examples: `apps/backend/project_analyzer.py`, `apps/backend/security/`
- Pattern: Base + stack-specific + custom commands cached in `.auto-claude-security.json`

**IPC Handlers:**
- Purpose: Bridge between Electron renderer and backend services
- Examples: `apps/frontend/src/main/ipc-handlers/`
- Pattern: Domain-specific handler modules registered via `ipc-setup.ts`

Comment on lines +106 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider documenting platform abstraction utilities as a key pattern.

Given the PR's focus on Windows CLI path normalization and the retrieved learnings about platform abstraction, consider adding a "Platform Abstraction" entry here documenting the centralized platform utilities (isWindows(), isMacOS(), normalizeExecutablePath(), findExecutable(), etc.) from the platform module. This would help developers understand this cross-cutting concern at the architectural level.

Based on learnings, the codebase emphasizes using platform abstraction functions instead of direct process.platform checks.

📝 Suggested addition
 **IPC Handlers:**
 - Purpose: Bridge between Electron renderer and backend services
 - Examples: `apps/frontend/src/main/ipc-handlers/`
 - Pattern: Domain-specific handler modules registered via `ipc-setup.ts`
+
+**Platform Abstraction:**
+- Purpose: Cross-platform compatibility for paths, executables, and OS detection
+- Examples: `apps/frontend/src/main/platform/`
+- Pattern: Centralized utilities (`isWindows()`, `normalizeExecutablePath()`, `findExecutable()`) replace direct platform checks
🤖 Prompt for AI Agents
In @.planning/codebase/ARCHITECTURE.md around lines 106 - 132, Add a "Platform
Abstraction" key abstraction that documents the centralized platform utilities
(e.g., functions isWindows(), isMacOS(), normalizeExecutablePath(),
findExecutable() from the platform module), describe their purpose as
centralizing OS differences and CLI/path normalization for cross-cutting use,
give example references to the platform module and call sites that replaced raw
process.platform checks, and state the pattern: prefer these utilities over
direct process.platform checks to ensure consistent, testable handling of
platform-specific behavior and path normalization across the codebase.

## Entry Points

**Backend CLI:**
- Location: `apps/backend/run.py`
- Triggers: Terminal, frontend subprocess spawn, direct invocation
- Responsibilities: Argument parsing, command routing to `cli/` modules

**Electron Main:**
- Location: `apps/frontend/src/main/index.ts`
- Triggers: Application launch
- Responsibilities: Window creation, IPC setup, service initialization

**Renderer Entry:**
- Location: `apps/frontend/src/renderer/main.tsx`
- Triggers: Window load
- Responsibilities: React app mount, store initialization

**Spec Pipeline:**
- Location: `apps/backend/spec/pipeline/orchestrator.py:SpecOrchestrator`
- Triggers: CLI `--task`, frontend task creation
- Responsibilities: Dynamic phase execution for spec creation

**Agent Loop:**
- Location: `apps/backend/agents/coder.py:run_autonomous_agent()`
- Triggers: CLI `--spec 001`, frontend build start
- Responsibilities: Subtask iteration, session management, recovery

## Error Handling

**Strategy:** Multi-level error handling with recovery support

**Patterns:**
- Agent sessions: `RecoveryManager` tracks state for resumption after interruption
- Security validation: Pre-tool-use hooks reject dangerous commands before execution
- QA loop: Escalation to human review after max iterations (`MAX_QA_ITERATIONS`)
- Git operations: Retry with exponential backoff for network errors
- Frontend: Error boundaries with toast notifications

## Cross-Cutting Concerns

**Logging:**
- Backend: Python `logging` module with task-specific loggers (`task_logger/`)
- Frontend: Electron app logger (`app-logger.ts`), Sentry integration

**Validation:**
- Command security: `security/` validators with dynamic allowlists
- Spec validation: `spec/validate_pkg/` for implementation plan schema
- Tool input: `security/tool_input_validator.py` for Claude tool arguments

**Authentication:**
- OAuth flow: `core/auth.py` manages Claude OAuth tokens
- Token storage: Keychain (macOS), Credential Manager (Windows), encrypted file (Linux)
- Token validation: Pre-SDK-call validation to prevent encrypted token errors

**Internationalization:**
- Frontend: `react-i18next` with namespace-organized JSON files
- Location: `apps/frontend/src/shared/i18n/locales/{en,fr}/`

---

*Architecture analysis: 2026-01-19*
Comment on lines +191 to +193
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdown formatting for the timestamp.

The timestamp uses italic emphasis, which markdown linters flag as unconventional. Use regular text instead of emphasis for metadata.

📝 Proposed fix
 ---
 
-*Architecture analysis: 2026-01-19*
+Architecture analysis: 2026-01-19
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
---
*Architecture analysis: 2026-01-19*
---
Architecture analysis: 2026-01-19
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

193-193: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In @.planning/codebase/ARCHITECTURE.md around lines 191 - 193, The timestamp
line currently uses markdown emphasis markers ("*Architecture analysis:
2026-01-19*"); remove the surrounding asterisks so the metadata appears as plain
text (e.g., Architecture analysis: 2026-01-19) to satisfy markdown linters and
keep the content un-emphasized in ARCHITECTURE.md.

Loading
Loading