Skip to content

Conversation

@rodrigo-olivares
Copy link
Collaborator

Summary

This PR fixes the issue where large agent definitions would silently fail to register. Previously, agents were passed via the --agents CLI flag which has platform-specific size limits (ARG_MAX). When agents exceeded these limits, a temp file workaround with @filepath was used, but the CLI silently failed to parse it.

Changes

This aligns the Python SDK with the TypeScript SDK approach:

  • Always use streaming mode internally (--input-format stream-json) - even for string prompts
  • Send agents via the initialize control request through stdin (no size limits)
  • Write string prompts to stdin after initialize (instead of using --print)
  • Remove the --agents CLI flag and temp file handling entirely

Why this works

The TypeScript SDK always uses the control protocol with stdin/stdout. The initialize request is sent via stdin which has no ARG_MAX constraints, allowing arbitrarily large agent definitions.

Testing

  • Added E2E tests with 260KB+ agent payloads (20 agents × 13KB prompts)
  • Verified agents are registered correctly in both ClaudeSDKClient and query() function
  • Tested WITHOUT the fix to confirm the old behavior fails silently (0/20 agents registered)
  • All 132 unit tests pass
  • All 8 E2E agent tests pass

Before/After

Scenario Before After
260KB agents via ClaudeSDKClient ❌ 0/20 registered (silent failure) ✅ 20/20 registered
260KB agents via query() ❌ 0/20 registered (silent failure) ✅ 20/20 registered

Closes the issue raised in https://github.com/anthropics/claude-cli-internal/pull/13749

…t SDK

Previously, agent definitions were passed via the --agents CLI flag, which
had platform-specific size limits (ARG_MAX). Large agent definitions would
trigger a temp file workaround with @filepath that silently failed.

This change aligns the Python SDK with the TypeScript SDK by:
- Always using streaming mode internally (--input-format stream-json)
- Sending agents via the initialize control request through stdin
- Writing string prompts to stdin after initialize (instead of --print)
- Removing the --agents CLI flag and temp file handling entirely

This eliminates size limits for agent definitions since stdin has no
ARG_MAX constraints, fixing the silent failure issue with large agents.

Claude-Generated-By: Claude Code (cli/claude-opus-4-5=100%)
Claude-Steers: 19
Claude-Permission-Prompts: 16
Claude-Escapes: 0
Claude-Plan:
<claude-plan>
# Plan: Migrate Agent Definitions to Initialize Method

## Summary

Migrate the Python SDK from passing agent definitions via CLI `--agents` flag to using the initialize control request via stdin, matching the TypeScript SDK pattern. This avoids ARG_MAX/command line length limits and eliminates the need for temporary file workarounds.

## Background

**Current Python SDK approach:**
- Passes agents via `--agents` CLI flag with JSON
- When command line exceeds platform limits (8000 on Windows, 100000 on others), writes JSON to temp file and uses `@filepath` pattern
- This approach is fragile and was identified as an "antipattern" by the Claude Code team

**TypeScript SDK approach:**
- Sends agents, systemPrompt, and appendSystemPrompt via the `initialize` control request through stdin
- The initialize request is part of the bidirectional control protocol
- No temporary files needed

## Critical Design Decision: Non-Streaming Mode

The `initialize()` request is only sent in **streaming mode** (when prompt is an AsyncIterable). In non-streaming mode (string prompt with `--print`), initialize is skipped.

**Approach:**
- **Streaming mode**: Send agents, systemPrompt, appendSystemPrompt via initialize request
- **Non-streaming mode**: Keep CLI flags (`--agents`, `--system-prompt`, etc.) as fallback
- Remove the temp file workaround entirely (following TypeScript SDK pattern)

## Files to Modify

1. **`src/claude_agent_sdk/types.py`** - Update `SDKControlInitializeRequest` type
2. **`src/claude_agent_sdk/_internal/query.py`** - Update Query to accept and send config via initialize
3. **`src/claude_agent_sdk/_internal/client.py`** - Pass new options to Query
4. **`src/claude_agent_sdk/client.py`** - Pass new options to Query
5. **`src/claude_agent_sdk/_internal/transport/subprocess_cli.py`** - Conditionally skip CLI flags in streaming mode
6. **`tests/test_transport.py`** - Update tests for build_command behavior
7. **`e2e-tests/test_agents_and_settings.py`** - Verify existing agent tests still pass

## Implementation Steps

### Step 1: Update `SDKControlInitializeRequest` type (`types.py`)

Add new optional fields to match the CLI's schema. Note: TypedDict keys must match the JSON field names (camelCase):

```python
class SDKControlInitializeRequest(TypedDict):
    subtype: Literal["initialize"]
    hooks: NotRequired[dict[HookEvent, Any] | None]
    sdkMcpServers: NotRequired[list[str]]  # SDK MCP server names
    jsonSchema: NotRequired[dict[str, Any]]  # For structured output
    systemPrompt: NotRequired[str]
    appendSystemPrompt: NotRequired[str]
    agents: NotRequired[dict[str, dict[str, Any]]]  # Agent definitions as dict
```

### Step 2: Update Query class (`_internal/query.py`)

**2a. Update `__init__` to accept new parameters:**

```python
def __init__(
    self,
    transport: Transport,
    is_streaming_mode: bool,
    can_use_tool: ... | None = None,
    hooks: dict[str, list[dict[str, Any]]] | None = None,
    sdk_mcp_servers: dict[str, "McpServer"] | None = None,
    initialize_timeout: float = 60.0,
    # New parameters:
    system_prompt: str | None = None,
    append_system_prompt: str | None = None,
    agents: dict[str, dict[str, Any]] | None = None,
):
```

**2b. Update `initialize()` method to include new fields:**

```python
async def initialize(self) -> dict[str, Any] | None:
    # ... existing hooks_config building ...

    # Build SDK MCP server names list
    sdk_mcp_server_names = list(self.sdk_mcp_servers.keys()) if self.sdk_mcp_servers else None

    # Send initialize request with all config
    request = {
        "subtype": "initialize",
        "hooks": hooks_config if hooks_config else None,
        "sdkMcpServers": sdk_mcp_server_names,
        "systemPrompt": self._system_prompt,
        "appendSystemPrompt": self._append_system_prompt,
        "agents": self._agents,
    }

    # Remove None values to keep request clean
    request = {k: v for k, v in request.items() if v is not None}

    # ... rest of method ...
```

### Step 3: Update `InternalClient.process_query()` (`_internal/client.py`)

Extract system_prompt, append_system_prompt, and agents from options and pass to Query:

```python
# Extract system prompt info for initialize request
system_prompt = None
append_system_prompt = None
if isinstance(configured_options.system_prompt, str):
    system_prompt = configured_options.system_prompt
elif configured_options.system_prompt and configured_options.system_prompt.get("type") == "preset":
    append_system_prompt = configured_options.system_prompt.get("append")

# Convert agents to dict format
agents_dict = None
if configured_options.agents:
    agents_dict = {
        name: {k: v for k, v in asdict(agent_def).items() if v is not None}
        for name, agent_def in configured_options.agents.items()
    }

query = Query(
    transport=chosen_transport,
    is_streaming_mode=is_streaming,
    can_use_tool=configured_options.can_use_tool,
    hooks=...,
    sdk_mcp_servers=sdk_mcp_servers,
    system_prompt=system_prompt,
    append_system_prompt=append_system_prompt,
    agents=agents_dict,
)
```

### Step 4: Update `ClaudeSDKClient.connect()` (`client.py`)

Same pattern as InternalClient - extract and pass the new parameters to Query.

### Step 5: Update `SubprocessCLITransport._build_command()` (`subprocess_cli.py`)

The transport already knows if it's in streaming mode via `self._is_streaming` (set based on prompt type).

**5a. Conditional CLI flag handling:**
```python
# In _build_command():
# Only pass --agents via CLI if NOT in streaming mode
# (streaming mode sends via initialize request)
if self._options.agents and not self._is_streaming:
    agents_dict = {
        name: {k: v for k, v in asdict(agent_def).items() if v is not None}
        for name, agent_def in self._options.agents.items()
    }
    agents_json = json.dumps(agents_dict)
    cmd.extend(["--agents", agents_json])

# Similarly for system-prompt and append-system-prompt
```

**5b. Remove temp file handling entirely:**
Following the TypeScript SDK pattern, remove all temp file handling code:
- Delete lines 336-365 (temp file creation logic)
- Delete `self._temp_files` list initialization (line 67)
- Delete cleanup in `close()` method

The TypeScript SDK always uses streaming mode with initialize for agents. Non-streaming mode with large agents is an edge case that doesn't need special handling - streaming mode should be used instead.

### Step 6: Update tests

**Unit tests (`tests/test_transport.py`):**
- Update tests for `_build_command()` to verify agents/systemPrompt flags are only added in non-streaming mode

**E2E tests (`e2e-tests/test_agents_and_settings.py`):**
- Verify existing agent tests still pass (they use streaming mode)
- Consider adding a test with large agent definitions

**New unit tests for Query:**
- Test that `initialize()` includes agents in the request when provided
- Test that `initialize()` includes systemPrompt/appendSystemPrompt when provided

## Verification

1. **Linting and type checking:**
   ```bash
   python -m ruff check src/ tests/ --fix
   python -m ruff format src/ tests/
   python -m mypy src/
   ```

2. **Run unit tests:**
   ```bash
   python -m pytest tests/
   ```

3. **Run E2E tests:**
   ```bash
   python -m pytest e2e-tests/
   ```

4. **Manual testing:**
   - Test streaming mode with agents (should use initialize)
   - Test non-streaming mode with agents (should use CLI flag)
   - Test with a large agent definition that would previously trigger the temp file workaround
</claude-plan>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants