Skip to content

fix(e2e): resolve plugin load error and SDK timeout warnings#312

Merged
sjnims merged 2 commits intomainfrom
fix/289-290-e2e-test-failures
Jan 19, 2026
Merged

fix(e2e): resolve plugin load error and SDK timeout warnings#312
sjnims merged 2 commits intomainfrom
fix/289-290-e2e-test-failures

Conversation

@sjnims
Copy link
Copy Markdown
Owner

@sjnims sjnims commented Jan 19, 2026

Description

Fix two E2E test failures that were preventing the test suite from passing:

  1. Plugin load error ([Bug]: E2E test fails with plugin load error "Cannot read properties of undefined (reading 'find')" #289): The isSystemMessage type guard in sdk-client.ts only checked type === "system", but the SDK has multiple system message types (init, status, hook_response, etc.). Only the init message has the plugins array, so when a non-init system message was received first, it caused "Cannot read properties of undefined (reading 'find')".

  2. SDK timeout warning ([Bug]: E2E test shows "timeout must be an integer" SDK warning causing retries #290): The E2E config helper was missing api_timeout_ms and temperature fields for generation and evaluation configs, causing undefined values to be passed to the SDK.

  3. Test file updates: Updated runExecution and runEvaluation calls to use options objects (matching the function signature changes from the recent refactoring in refactor: convert 5-6 parameter functions to options objects #311).

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Performance optimization (improves efficiency without changing behavior)
  • Refactoring (code change that neither fixes a bug nor adds a feature)
  • Test (adding or updating tests)
  • Documentation update (improvements to README, CLAUDE.md, or inline docs)
  • Configuration change (changes to config.yaml, eslint, tsconfig, etc.)

Component(s) Affected

Pipeline Stages

  • Stage 1: Analysis (src/stages/1-analysis/)
  • Stage 2: Generation (src/stages/2-generation/)
  • Stage 3: Execution (src/stages/3-execution/)
  • Stage 4: Evaluation (src/stages/4-evaluation/)

Core Infrastructure

  • CLI (src/cli/)
  • Entry Point (src/index.ts)
  • Configuration (src/config/)
  • State Management (src/state/)
  • Types (src/types/)
  • Utilities (src/utils/)

Other

  • Tests (tests/)
  • Documentation (CLAUDE.md, README.md)
  • Configuration files (config.yaml, eslint.config.js, tsconfig.json, etc.)
  • GitHub templates/workflows (.github/)
  • Other (please specify):

Motivation and Context

E2E tests were failing with two distinct errors:

  1. "Cannot read properties of undefined (reading 'find')" during plugin load
  2. "timeout must be an integer" SDK warnings causing retries

Both issues only manifested during E2E tests with real API calls because unit tests use mocks that don't expose these edge cases.

Fixes #289
Fixes #290

How Has This Been Tested?

Test Configuration:

  • Node.js version: v25.x
  • OS: macOS

Test Steps:

  1. npm run typecheck - passes
  2. npm run lint - passes
  3. npm run format - no changes
  4. npm run knip - no dead code
  5. npm run jscpd - no new duplicates
  6. npm run madge - no circular dependencies
  7. RUN_E2E_TESTS=true npm test -- tests/e2e/pipeline.test.ts - 4 passed, 1 skipped

Checklist

General

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors

TypeScript / Code Quality

  • All functions have explicit return types
  • Strict TypeScript checks pass (npm run typecheck)
  • ESM import/export patterns used correctly
  • Unused parameters prefixed with _
  • No any types without justification

Documentation

  • I have updated CLAUDE.md if behavior or commands changed (N/A - no behavior change)
  • I have updated inline JSDoc comments where applicable
  • I have verified all links work correctly

Linting

  • I have run npm run lint and fixed all issues
  • I have run npm run format:check
  • I have run markdownlint "*.md" on Markdown files (N/A - no MD changes)
  • I have run uvx yamllint -c .yamllint.yml on YAML files (if modified) (N/A)
  • I have run actionlint on workflow files (if modified) (N/A)

Testing

  • I have run npm test and all tests pass
  • I have added tests for new functionality (N/A - bug fix)
  • Test coverage meets thresholds (78% lines, 75% functions, 65% branches)
  • I have tested with a sample plugin (if applicable)

Stage-Specific Checks

Stage 3: Execution (click to expand)
  • Claude Agent SDK integration works correctly
  • Tool capture via PreToolUse hooks functions properly
  • Timeout handling works as expected
  • Session isolation prevents cross-contamination
  • Permission bypass works for automated execution

Example Output (if applicable)

✓ tests/e2e/pipeline.test.ts (5 tests | 1 skipped) 54058ms
     ✓ runs full evaluation pipeline for all component types 26600ms
     ✓ correctly identifies non-triggering prompts 18997ms
     ✓ handles budget limits gracefully 8448ms
     ✓ generates deterministic scenarios for commands and hooks 13ms
     ↓ runs complete pipeline for MCP servers

 Test Files  1 passed (1)
      Tests  4 passed | 1 skipped (5)

Additional Notes

The root causes were:

  1. SDK sends multiple system message types (init, status, hook_response, etc.) but only init has the plugins array
  2. E2E config helper constructed configs manually without going through Zod schema validation, so defaults weren't applied

Reviewer Notes

Areas that need special attention:

  • The type cast in isSystemMessage is necessary because the SDK's type union doesn't expose subtype on the base type

Known limitations or trade-offs:

  • None

🤖 Generated with Claude Code

Fix two E2E test failures:

1. Plugin load error (#289): The `isSystemMessage` type guard only
   checked `type === "system"` but the SDK has multiple system message
   types (init, status, hook_response, etc.). Only the `init` message
   has the `plugins` array. Added check for `subtype === "init"`.

2. SDK timeout warning (#290): E2E config helper was missing
   `api_timeout_ms` and `temperature` fields for generation and
   evaluation configs. Added these fields with schema defaults.

3. Test file updates: Updated `runExecution` and `runEvaluation` calls
   to use options objects (matching the function signature changes from
   the recent refactoring in #311).

Fixes #289
Fixes #290

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Jan 19, 2026

Claude finished @sjnims's task in 1m 39s —— View job


PR Review: fix(e2e): resolve plugin load error and SDK timeout warnings

Todo List

  • Run quality checks (typecheck, lint, format, knip)
  • Review PR diff
  • Analyze changed files
  • Check against review criteria
  • Post inline comments for specific issues
  • Provide summary verdict

Verdict: ✅ Looks good - Approve

All quality checks pass and the fixes are sound. The PR correctly addresses both E2E test failures with appropriate solutions.


Quality Checks

Check Result
npm run typecheck ✅ Passed
npm run lint ✅ Passed
npm run format:check ✅ Passed
npm run knip ✅ Passed

What's Good

1. Correct Root Cause Analysis

  • The isSystemMessage fix properly addresses the issue where the SDK sends multiple system message types (init, status, hook_response, etc.) but only init has the plugins array
  • The type cast (msg as { subtype?: string }).subtype === "init" is necessary because the SDK's type union doesn't expose subtype on the base type
  • Good documentation explains why this narrowing is needed

2. Proper Default Values

  • The E2E config helper now includes the correct schema defaults:
    • Generation: api_timeout_ms: 60000, temperature: 0.3 (matches schema defaults)
    • Evaluation: api_timeout_ms: 120000, temperature: 0.1 (matches schema defaults)
  • This prevents undefined from being passed to the SDK, which was causing the timeout warnings

3. Consistent Refactoring

4. Good Testing

  • Author ran E2E tests with real API calls: 4 passed, 1 skipped
  • This verifies both fixes work in the actual E2E environment where these issues manifest

Suggestions

Minor: Consider Type Safety Enhancement (Optional)

The type cast in isSystemMessage works correctly but could potentially be made more type-safe in the future. Consider documenting this in a TODO comment if the SDK types might be improved upstream:

/**
 * Type guard for system init message (the one with plugins).
 * The SDK has multiple system message types (init, status, hook_response, etc.),
 * but only 'init' has the plugins array we need.
 * 
 * TODO: If SDK exports subtype on base type, remove cast
 */
export function isSystemMessage(msg: SDKMessage): msg is SDKSystemMessage {
  return (
    msg.type === "system" && (msg as { subtype?: string }).subtype === "init"
  );
}

However, this is purely optional - the current implementation is correct and the comment adequately explains the situation.


Architecture Review

Stage 3 - Execution

  • Proper Agent SDK usage: The fix correctly narrows to the specific system message subtype needed
  • Type guard follows existing patterns in the codebase
  • No breaking changes to the execution flow

Testing Patterns

  • E2E config helper properly constructs configs with all required fields
  • Test updates maintain consistency with recent refactoring
  • Coverage should remain at or above thresholds

Summary

This is a well-executed bug fix that:

  1. Correctly identifies and fixes the plugin load error by narrowing to init subtype
  2. Prevents SDK timeout warnings by providing proper default values
  3. Maintains consistency with recent refactoring (refactor: convert 5-6 parameter functions to options objects #311)
  4. Passes all quality checks and E2E tests

Recommendation: Merge


@sjnims sjnims self-assigned this Jan 19, 2026
@claude claude bot added bug Something isn't working effort:small < 1 hour stage:execution Stage 3: Agent SDK execution with tool capture sdk:agent Claude Agent SDK usage (Stage 3) labels Jan 19, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Jan 19, 2026

CI Failure Analysis: Test

Run: #21151522900 | Commit: 15a76e0

Summary

The test suite failed with 1 test failure in sdk-client.test.ts. The isSystemMessage type guard is returning false when it should return true for a minimal system message object. Additionally, there's an unhandled error about a missing test environment.

Failures Found

File Line Issue
tests/unit/stages/3-execution/sdk-client.test.ts 247 isSystemMessage(msg) returns false but expected true for minimal system message
tests/unit/stages/2-generation/agent-scenario-generator.test.ts N/A Unhandled Error: Cannot find environment for test file

Root Cause

The isSystemMessage type guard function is likely failing to correctly identify a minimal system message object. The test at line 247 creates a minimal message and expects the type guard to return true, but it's returning false instead.

This suggests either:

  1. The type guard's validation logic is too strict or incorrect
  2. The minimal message structure doesn't match what the type guard expects
  3. Recent changes to the SDK types or message structure aren't reflected in the type guard

How to Fix

  1. Review the type guard implementation in src/stages/3-execution/sdk-client.ts (or wherever isSystemMessage is defined)
  2. Check the test's message structure at line 242-247 to see what properties it's setting
  3. Ensure the type guard matches the SDK's actual SystemMessage type - it may need to be updated if the SDK changed

Run locally to debug:

npm test sdk-client.test.ts

You can also run just the failing test:

npm test -- --reporter=verbose -t "returns true for minimal system message"

Test Results Summary

  • Test Files: 1 failed | 51 passed | 1 skipped (63 total)
  • Tests: 1 failed | 1252 passed | 5 skipped (1275 total)

Analyzed by Claude

The isSystemMessage type guard now checks for subtype === "init" since
only init messages have the plugins array. Updated tests accordingly:

- Renamed "minimal system message" test to "minimal system init message"
- Added subtype: "init" to make it a valid init message
- Added new test verifying non-init system messages return false

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sjnims
Copy link
Copy Markdown
Owner Author

sjnims commented Jan 19, 2026

Fixed CI failure: Updated the unit test for isSystemMessage to reflect the new requirement that system messages must have subtype === 'init' to be considered valid init messages (since only init messages have the plugins array).

Changes in a100552:

  • Renamed test from "minimal system message" to "minimal system init message"
  • Added subtype: 'init' to make it a valid init message
  • Added new test verifying non-init system messages (like status) return false

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Jan 19, 2026

Claude finished @sjnims's task in 2m 35s —— View job


Todo List

  • Run quality checks (typecheck, lint, format, knip)
  • Review PR diff
  • Analyze changed files
  • Check against review criteria
  • Post inline comments for specific issues
  • Provide summary verdict

Verdict: ✅ Looks good - Approve

This PR correctly fixes both E2E test failures with well-reasoned solutions. All quality checks pass and CI is green.


Quality Checks

Check Result
npm run typecheck ✅ Passed
npm run lint ✅ Passed
npm run format:check ✅ Passed
npm run knip ✅ Passed
CI Status ✅ All passing

What's Good

1. Correct Root Cause Analysis & Fix

The isSystemMessage type guard fix properly addresses the issue where the SDK sends multiple system message types (init, status, hook_response, etc.) but only init has the plugins array that plugin-loader.ts:369 needs.

Before (line 275-278):

export function isSystemMessage(msg: SDKMessage): msg is SDKSystemMessage {
  return msg.type === "system";
}

After:

export function isSystemMessage(msg: SDKMessage): msg is SDKSystemMessage {
  return (
    msg.type === "system" && (msg as { subtype?: string }).subtype === "init"
  );
}

The type cast (msg as { subtype?: string }) is necessary because the SDK's SDKMessage type union doesn't expose subtype on the base type, even though it exists at runtime. The updated JSDoc comment clearly explains this constraint.

2. Proper Schema Default Values

The E2E config helper (tests/e2e/helpers.ts) now includes the correct schema defaults, preventing undefined from being passed to the SDK:

Config Field Value Schema Default
Generation api_timeout_ms 60000 ✅ Matches src/config/schema.ts:60000
Generation temperature 0.3 ✅ Matches src/config/schema.ts:0.3
Evaluation api_timeout_ms 120000 ✅ Matches src/config/schema.ts:120000
Evaluation temperature 0.1 ✅ Matches src/config/schema.ts:0.1

3. Consistent Refactoring

Test file updates correctly use options objects for runExecution and runEvaluation, matching the function signature changes from #311. All 4 call sites updated consistently:

  • tests/e2e/pipeline.test.ts:138-142 (positive scenarios)
  • tests/e2e/pipeline.test.ts:161-168 (evaluation)
  • tests/e2e/pipeline.test.ts:247-254 (negative scenarios)
  • tests/e2e/pipeline.test.ts:309-315 (budget limits)
  • tests/e2e/pipeline.test.ts:406-422 (MCP server pipeline)

4. Comprehensive Test Coverage

The unit test updates in sdk-client.test.ts demonstrate thorough testing:

  • ✅ Minimal system init message (with subtype: 'init') returns true
  • ✅ System status messages (with subtype: 'status') return false
  • ✅ System messages without subtype return false

This ensures the type guard correctly distinguishes between different system message subtypes.

5. Good Documentation

Both the JSDoc comment and the PR description clearly explain:

  • Why the change was needed (multiple system message types)
  • Which message type has the plugins array (init only)
  • Why the type cast is necessary (SDK types don't expose subtype)

Architecture Review

Stage 3 - Execution

  • Proper Agent SDK usage: The fix correctly narrows to the specific system message subtype needed by plugin-loader.ts:368-371
  • Type guard follows existing patterns and is used consistently across:
    • plugin-loader.ts:148 - Processes init message with plugins array
    • transcript-builder.ts - Finds init message for transcript
  • No breaking changes to execution flow
  • Session isolation and permission bypass unchanged

Testing Patterns

TypeScript Quality

  • No any types without justification
  • Type cast is documented and necessary
  • All imports use .js extension (ESM requirement)
  • Explicit return types maintained

Code Pattern Compliance

Pattern Status Notes
Error Handling Uses typed errors via processInitMessage
Config Validation Defaults now match Zod schema
Type Guards Follows existing pattern (see isResultMessage, isErrorMessage)
Testing Added tests for new subtype checking behavior

Minor Observations

Type Safety Enhancement (Optional)

The type cast in isSystemMessage is correct and well-documented. If the SDK upstream eventually exposes subtype on the base SDKMessage type, this cast could be removed. Consider tracking this as a low-priority upstream improvement:

// Future improvement if SDK exports subtype on base type:
export function isSystemMessage(msg: SDKMessage): msg is SDKSystemMessage {
  return msg.type === "system" && msg.subtype === "init";
}

However, this is purely optional - the current implementation is correct and properly documented.


Summary

This PR successfully resolves two distinct E2E test failures:

  1. Plugin load error ([Bug]: E2E test fails with plugin load error "Cannot read properties of undefined (reading 'find')" #289): Fixed by narrowing isSystemMessage to only match init subtype messages that have the plugins array
  2. SDK timeout warning ([Bug]: E2E test shows "timeout must be an integer" SDK warning causing retries #290): Fixed by providing proper default values (api_timeout_ms, temperature) in E2E config helper
  3. Test consistency: Updated all runExecution/runEvaluation calls to use options objects matching refactor: convert 5-6 parameter functions to options objects #311

The fixes are:

  • ✅ Well-reasoned and correctly implemented
  • ✅ Thoroughly tested (unit tests + E2E tests passing)
  • ✅ Properly documented
  • ✅ Architecturally sound
  • ✅ Pass all quality checks

Recommendation: Merge


@claude claude bot added effort:medium 1-4 hours test:unit Unit test changes (tests/unit/) and removed effort:small < 1 hour labels Jan 19, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@sjnims sjnims merged commit 5c04d28 into main Jan 19, 2026
13 checks passed
@sjnims sjnims deleted the fix/289-290-e2e-test-failures branch January 19, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working effort:medium 1-4 hours sdk:agent Claude Agent SDK usage (Stage 3) stage:execution Stage 3: Agent SDK execution with tool capture test:unit Unit test changes (tests/unit/)

Projects

None yet

1 participant