Version: 1.0
Date: 2026-03-17
Author: System Architect
Quality Score: 94/100
PRD Reference: docs/refactor/PRD.md
Status: Final
Architecture Style: Modular monolith (single Go module, layered packages) Primary Stack: Go 1.24+ / anthropic-sdk-go / openai-go / MCP go-sdk / OTel Dependency Direction:
api -> {model, tool, middleware, hooks, config, message, sandbox, mcp, runtime/*}, never reverse Naming Rule: Go standard: packages lowercase, types PascalCase, files snake_case, constants PascalCase API Format: Go function calls (not HTTP REST -- this is an SDK) Error Pattern: Sentinel errors at package level (var ErrXxx = errors.New(...)), wrap withfmt.Errorf("pkg: action: %w", err)Project Root:pkg/with ~11 domain-based packages Key Constraint: Zero new dependencies; <=20K non-test LOC; <=11 packages; single Model interface
agentsdk-go v2 is a big-bang rewrite that reduces the SDK from ~34K non-test lines across 24+ packages to ~15-20K lines across ~11 packages. The rewrite eliminates the dual Model interface, simplifies compaction into a prompt-compression step that strips tool I/O, consolidates event types from 16 to 7, middleware stages from 6 to 4, and built-in tools from 11 to 7.
The agent core loop (~189 lines in v1) is absorbed into pkg/api, calling model.Model.CompleteStream directly without a bridge adapter. The Runtime struct drops from 17+ fields to ~7 essential fields. ACP and task management are removed from core.
All architectural decisions trace directly to the PRD's KISS/YAGNI philosophy: every line must carry its weight.
- Single Interface, Zero Adapters: One
model.Modelinterface with two methods. No bridge structs, no glue code. Rationale: the dual interface added ~200 lines of adapter code that obscured data flow (PRD G1). - Compaction Is a Controlled Model Call: When compaction triggers, core performs a dedicated prompt-compression model call and strips tool I/O from the compressed portion. Rationale: tool output is high-token noise; compression preserves intent while keeping the preserved tail unmodified (PRD G3).
- Minimal Surface Area: 7 events, 4 middleware stages, 7 tools. Features outside this core set are deleted from v2 core and can be reintroduced by users as custom code. Rationale: every extension point is maintenance surface; fewer points means less drift (PRD G5).
- YOLO Default, Safety Hook: All tool executions are allowed by default. A Go-native safety hook blocks catastrophic commands. Users opt-in to stricter models. Rationale: the permission system added complexity for a problem most SDK users solve differently in their own infrastructure (PRD FR-9).
┌──────────────────────────────────────────┐
│ pkg/api │
│ (Runtime, agent loop, system prompt, │
│ compaction, request orchestration) │
└──┬───┬───┬───┬───┬───┬───┬───┬───┬──┬──┘
│ │ │ │ │ │ │ │ │ │
┌────────────┘ │ │ │ │ │ │ │ │ └──────────┐
▼ ▼ │ ▼ │ ▼ │ ▼ ▼ ▼
┌─────────┐ ┌──────────┤ ┌────┤ ┌────┤ ┌─────────┐ ┌──────────┐
│pkg/model│ │pkg/tool │ │pkg/│ │pkg/│ │pkg/hooks │ │pkg/config│
│ │ │ │ │mid │ │msg │ │ │ │ │
│ Model │ │ Registry │ │dle │ │ │ │ Executor │ │ Settings │
│ interf. │ │ Executor │ │ware│ │ │ │ Events │ │ Loader │
└─────────┘ │ builtin/ │ │ │ │ │ │ │ │ Rules │
└──────────┘ └────┘ └────┘ └──────────┘ └──────────┘
│ │
┌────┴────┐ ┌─────┴─────┐
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐
│pkg/mcp │ │pkg/sand- │ │pkg/run- │ │pkg/git-│
│ │ │box │ │time/ │ │ignore │
│ Client │ │ Manager │ │ skills/ │ │ │
└────────┘ └──────────┘ │ subag./ │ └────────┘
│ cmds/ │
└─────────┘
(No contrib/ in v2 core.)
Dependency Rules (strict, never reverse):
pkg/apidepends on all otherpkg/*packages.pkg/tool/builtindepends onpkg/tool,pkg/sandbox,pkg/gitignore.pkg/hooksdepends on nothing inpkg/except its own event types.pkg/modeldepends on nothing inpkg/.pkg/messagedepends on nothing inpkg/.pkg/middlewaredepends onpkg/runtime/skills(for trace middleware skill logging only).pkg/configdepends on nothing inpkg/.pkg/mcpdepends on nothing inpkg/.pkg/sandboxdepends on nothing inpkg/.pkg/gitignoredepends on nothing inpkg/.
| # | Package | Responsibility | Absorbed From |
|---|---|---|---|
| 1 | pkg/api |
Runtime struct, agent loop, compaction, system prompt, request orchestration | pkg/agent (233 lines), system prompt parts of pkg/prompts |
| 2 | pkg/model |
Model interface, Anthropic provider, OpenAI provider, types | Stays |
| 3 | pkg/tool |
Tool interface, Registry, Executor, Validator | Stays |
| 4 | pkg/tool/builtin |
7 built-in tools + helpers | Drops task/slash/askuser/todo/web/tools |
| 5 | pkg/middleware |
Middleware interface (4 stages), Chain, Trace, HTTP trace | Absorbs pkg/core/middleware (24 lines) |
| 6 | pkg/hooks |
Hook Executor, Event types, Event payloads | Merges pkg/core/events + pkg/core/hooks |
| 7 | pkg/config |
Settings types, SettingsLoader, RulesLoader, hot-reload | Stays |
| 8 | pkg/message |
In-memory History, Message types, token counter | Stays (as-is per PRD) |
| 9 | pkg/sandbox |
Filesystem/network isolation manager | Stays |
| 10 | pkg/mcp |
MCP client session management | Stays |
| 11 | pkg/gitignore |
.gitignore pattern matcher | Stays (as-is per PRD) |
| 12 | pkg/runtime/skills |
Skills registry, loader, matcher, prompt templates | Absorbs skill-related templates from pkg/prompts |
| 13 | pkg/runtime/subagents |
Subagent manager, definitions, dispatch | Stays |
Note on count: pkg/runtime/* sub-packages (skills, subagents) are counted as subdirectories under pkg/runtime. pkg/runtime/commands is deleted (slash commands removed per PRD). pkg/runtime/tasks is deleted (tasks removed). pkg/security is absorbed into pkg/hooks/safety.go (safety hook replaces the full permission system). The runtime/ directory is a logical group, not counted as a separate package. |
Final directory count: api, model, tool, tool/builtin, middleware, hooks, config, message, sandbox, mcp, gitignore, runtime/skills, runtime/subagents = 13 directories. To meet <=11, flatten runtime/skills -> pkg/skills, runtime/subagents -> pkg/subagents (removing the runtime/ container).
Location: pkg/model/interface.go
// Model is the provider-agnostic interface for LLM completion.
// Both Anthropic and OpenAI providers implement this directly.
// The agent loop calls CompleteStream for streaming, Complete for blocking.
type Model interface {
Complete(ctx context.Context, req Request) (*Response, error)
CompleteStream(ctx context.Context, req Request, cb StreamHandler) error
}What changes from v1:
agent.Model(withGenerate(ctx, *Context) (*ModelOutput, error)) is deleted.- The
conversationModelbridge adapter inpkg/api/agent.go(lines 993-1081 in v1) is deleted. - The agent loop in
pkg/apicallsmodel.CompleteStreamdirectly, managing conversation history and tool call extraction inline.
Why: The bridge adapter converted between two representations of the same data (messages + tool calls). This is a pure translation layer with zero business logic. Removing it eliminates ~200 lines and one entire package (pkg/agent).
Location: pkg/tool/tool.go (unchanged)
type Tool interface {
Name() string
Description() string
Schema() *JSONSchema
Execute(ctx context.Context, params map[string]any) (*ToolResult, error)
}Location: pkg/middleware/types.go
type Stage int
const (
StageBeforeAgent Stage = iota
StageBeforeTool
StageAfterTool
StageAfterAgent
)
type Middleware interface {
Name() string
BeforeAgent(ctx context.Context, st *State) error
BeforeTool(ctx context.Context, st *State) error
AfterTool(ctx context.Context, st *State) error
AfterAgent(ctx context.Context, st *State) error
}
// Funcs adapts function pointers to Middleware.
type Funcs struct {
Identifier string
OnBeforeAgent func(ctx context.Context, st *State) error
OnBeforeTool func(ctx context.Context, st *State) error
OnAfterTool func(ctx context.Context, st *State) error
OnAfterAgent func(ctx context.Context, st *State) error
}What changes from v1:
StageBeforeModelandStageAfterModelare removed.BeforeModel()andAfterModel()methods are removed from the interface.- Model-level interception is unnecessary -- the model call is a deterministic function of messages + tools. Observability of model calls is achieved via
BeforeAgent(sees the request) andAfterAgent(sees the response including model usage).
Why: In v1, BeforeModel/AfterModel were used by trace middleware to observe model requests/responses. This can be done more naturally at the agent level: BeforeAgent fires once per request and AfterAgent fires once per response, giving the trace middleware a cleaner signal. Two fewer methods on the interface means less boilerplate for every middleware implementation.
Location: pkg/hooks/types.go
type EventType string
const (
PreToolUse EventType = "PreToolUse"
PostToolUse EventType = "PostToolUse"
SessionStart EventType = "SessionStart"
SessionEnd EventType = "SessionEnd"
Stop EventType = "Stop"
SubagentStart EventType = "SubagentStart"
SubagentStop EventType = "SubagentStop"
)Deleted events (9 total): PostToolUseFailure, PreCompact, ContextCompacted, UserPromptSubmit, PermissionRequest, Notification, TokenUsage, ModelSelected, MCPToolsChanged.
Why each was removed:
PostToolUseFailure: Subsumed byPostToolUse-- the payload already contains anErrfield.PreCompact,ContextCompacted: Compaction is an internal runtime concern and does not need hook intervention.UserPromptSubmit: YOLO mode does not need to validate user input via hooks.PermissionRequest: YOLO mode defaults to allow-all; dangerous command blocking is handled by the Go-native safety hook inPreToolUse, not a separate permission event.Notification,TokenUsage,ModelSelected,MCPToolsChanged: Niche events with zero known shell hook consumers in the wild.
Why SubagentStart/SubagentStop are kept: Subagent lifecycle visibility is essential for observability and tracing. Users need to know when child agents are spawned and terminated, especially for debugging long-running multi-agent workflows.
Location: pkg/api/agent.go
type Runtime struct {
opts Options // frozen configuration
model model.Model // the single model provider
registry *tool.Registry // tool registry
executor *tool.Executor // tool executor
hooks *hooks.Executor // hook executor
histories *historyStore // in-memory session histories
compactor *compactor // prompt-compression compactor
mu sync.RWMutex
closeOnce sync.Once
closeErr error
closed bool
}Removed fields (10 from v1):
cmdExec-- slash command execution folded into request pathtaskStore-- deleted (tasks removed)tokens-- token tracking is derivable from model responsesrecorder-- hook recording folded into per-request statesessionGate-- concurrent session guard removed (callers manage)historyPersister-- disk persistence removed from corerulesLoader-- rules loaded once at init, not stored as fieldownsTaskStore-- gone withtaskStoresandbox-- moved to tool executor concernsbRoot,cfg,fs,settings,mode,tracer-- consolidated or removed
Why: Most removed fields serve niche use cases (task tracking, disk persistence, session gating) that belong in application code. The Runtime should be a thin orchestrator, not a god object.
User Code
│
▼
Runtime.Run(ctx, Request)
│
├── 1. Resolve session history (historyStore.getOrCreate)
├── 2. Append user message to history
├── 3. Build model.Request (system prompt + history + tools)
├── 4. LOOP:
│ │
│ ├── 4a. Check context cancellation
│ ├── 4b. Check max iterations
│ ├── 4c. middleware.Chain.Execute(BeforeAgent) [first iteration only]
│ ├── 4d. model.CompleteStream(ctx, req, handler)
│ │ └── handler accumulates StreamResults into Response
│ ├── 4e. Append assistant message to history
│ ├── 4g. If no tool calls or Done → break
│ ├── 4h. For each tool call:
│ │ ├── hooks.Execute(PreToolUse)
│ │ ├── SafetyHook check
│ │ ├── middleware.Chain.Execute(BeforeTool)
│ │ ├── tool.Execute(ctx, params)
│ │ ├── middleware.Chain.Execute(AfterTool)
│ │ ├── hooks.Execute(PostToolUse)
│ │ └── Append tool result to history
│ ├── 4i. compactor.MaybeCompact(history)
│ └── 4j. Rebuild model.Request with updated history
│
├── 5. Return final output
│
▼
User Code receives Result
Runtime.RunStream(ctx, Request) → chan StreamEvent
│
└── Internally:
├── Same loop as Run()
├── model.CompleteStream callback emits deltas to channel
├── Tool calls/results emitted as structured events
└── Final event signals completion
v1 data flow (three-layer indirection):
Runtime.Run → agent.Agent.Run → conversationModel.Generate → model.CompleteStream
↑ bridge adapter
v2 data flow (direct):
Runtime.Run → model.CompleteStream (inline loop)
The agent loop logic from pkg/agent/agent.go (lines 70-189) is inlined into Runtime.runLoop() in pkg/api. This eliminates:
- The
agent.Modelinterface - The
agent.Contexttype (replaced bymiddleware.State) - The
agent.ToolExecutorinterface (replaced by directtool.Executorcalls) - The
conversationModelbridge adapter (~90 lines)
v1 compaction (pkg/api/compact.go, 450 lines) mixes two concerns:
- Memory management (reduce context size)
- Information preservation (summarize intent)
In v2, compaction is still a model call (prompt compression) because it preserves intent better than pure dropping, but it must be tightly controlled:
- Preserve the last N messages unmodified.
- Strip tool I/O from the compressed portion so tool outputs do not dominate context.
- Avoid v1 complexity (no fallback model, no retries, no rollout writers).
Input: messages[] (full conversation history)
preserveCount (number of recent messages to keep, default 5)
threshold (trigger ratio, default 0.8)
tokenLimit (model context window size)
Trigger condition:
tokenCount(messages) / tokenLimit >= threshold
AND len(messages) > preserveCount
Algorithm:
1. cut = len(messages) - preserveCount
2. Compute toolTransactionSpans(messages)
3. For each span that straddles the cut point:
Move cut earlier to span.start (never orphan tool results in the preserved tail)
4. If cut <= 0: skip compaction (all messages are within one transaction)
5. head = messages[0:cut]
6. tail = messages[cut:]
7. Build compression input from head by filtering out tool-call/tool-result content
8. Call model.Complete(...) with a dedicated "prompt compression" instruction
9. Replace history with: [summaryMessage] + tail
10. Return: {tokensBefore, tokensAfter, preservedCount, summarySize}
For the compressed portion (head), tool-call/tool-result content is excluded from the compression input. The preserved tail is kept verbatim.
The toolTransactionSpans() function from v1 is reused (it is correct and ~40 lines). A tool transaction is the span from an assistant message containing tool calls to the last corresponding tool result message. The compaction cut point is always adjusted to avoid splitting a transaction.
type toolTransactionSpan struct {
start int // index of assistant message with tool calls
end int // index after last tool result message
}Edge case: If the entire history is one unfinished tool transaction, compaction is skipped. This is correct behavior -- you cannot drop messages mid-transaction.
type CompactConfig struct {
Enabled bool `json:"enabled"`
Threshold float64 `json:"threshold"` // trigger ratio (default 0.8)
PreserveCount int `json:"preserve_count"` // keep latest N messages (default 5)
}Removed fields (from v1 CompactConfig):
SummaryModel,FallbackModel-- compaction uses the primary model with a dedicated promptMaxRetries,RetryDelay-- no retries in core compactionPreserveInitial,InitialCount-- system prompt is injected per-request, not stored in historyPreserveUserText,UserTextTokens-- over-engineering for edge casesRolloutDir-- no rollout persistence in core
The entire compaction implementation targets <= 200 non-test lines in a single file pkg/api/compact.go.
In v1, when a subagent completes work, the parent agent receives the raw result but has no mechanism to inject a structured summary into its own context. The subagent's work is a "black box" -- the parent knows the task completed but not what was learned.
When a subagent completes via Manager.Dispatch(), the result is formatted as a structured summary and injected into the parent agent's conversation history as a system message:
// In Runtime, after subagent dispatch returns:
summary := formatSubagentSummary(result)
parentHistory.Append(message.Message{
Role: "user", // user role so model treats it as context
Content: summary,
})[Subagent Result: {name}]
Task: {instruction}
Status: {success|error}
Output: {result.Output, truncated to 2000 chars}
{if error: Error: {result.Error}}
The summary is injected as a user message (not system) because:
- System messages in Anthropic API have special handling and token costs
- User messages are treated as conversation context the model should address
- The model naturally responds to user-role information
Summary injection is enabled by default when subagents are registered. It can be disabled via Options.DisableSubagentSummary bool.
The safety hook replaces v1's complex permission system (pkg/security/approval.go, pkg/security/permission_matcher.go, pkg/security/resolver.go) with a single Go function that blocks catastrophic operations.
// SafetyHook is called before every tool execution. Return a non-nil
// error to block the tool call. The error message is returned to the model.
type SafetyHook func(ctx context.Context, toolName string, params map[string]any) errorThe default hook reuses the blocklist patterns from pkg/security/validator.go:
func DefaultSafetyHook(ctx context.Context, toolName string, params map[string]any) error {
if toolName != "bash" && toolName != "Bash" {
return nil // only bash commands need safety validation
}
command, ok := params["command"].(string)
if !ok || command == "" {
return nil
}
return defaultValidator.Validate(command)
}Blocked patterns (same as v1 security.Validator):
- Commands:
dd,mkfs,fdisk,parted,shutdown,reboot,halt,poweroff,mount,sudo - Fragments:
rm -rf,rm -fr,rm -r,rm --recursive,rmdir -p,rm *,rm /,-rf /,--no-preserve-root - Arguments:
--no-preserve-root,--preserve-root=false,/dev/,../
By default, Options.SafetyHook is set to DefaultSafetyHook. This means:
- All tool calls are allowed without interactive permission prompts
- Only catastrophic bash commands are blocked
- No "ask" or "deny" permission rules -- tools just execute
Users who need stricter security can:
- Replace the safety hook:
Options.SafetyHook = myCustomHook - Use sandbox isolation:
Options.Sandboxenables filesystem/network guards - Use permission hooks: Register
PreToolUsehooks that return deny decisions
The safety hook is a Go function call, not a shell command. Overhead: <1ms per tool call (string matching against ~15 patterns). No process spawning, no JSON serialization.
Priority (high → low):
1. Runtime overrides (Options.SettingsOverrides)
2. .agents/settings.local.json (gitignored, developer-specific)
3. .agents/settings.json (project-level, tracked)
4. SDK defaults
The config.Settings struct is unchanged from v1. Key fields:
type Settings struct {
Permissions *PermissionsConfig `json:"permissions,omitempty"`
Hooks *HooksConfig `json:"hooks,omitempty"`
Env map[string]string `json:"env,omitempty"`
Model string `json:"model,omitempty"`
MCP *MCPConfig `json:"mcp,omitempty"`
Sandbox *SandboxConfig `json:"sandbox,omitempty"`
DisallowedTools []string `json:"disallowedTools,omitempty"`
RespectGitignore *bool `json:"respectGitignore,omitempty"`
// ... other fields preserved
}Configuration hot-reload via fsnotify is preserved. When .agents/settings.json or .agents/settings.local.json changes:
- Re-load and merge settings
- Update tool registry (add/remove tools per
DisallowedTools) - Update hook executor (register/unregister hooks per
Hooks) - No restart required
.agents/
├── settings.json # Project configuration (tracked)
├── settings.local.json # Developer overrides (gitignored)
├── skills/ # Skill definitions (*.md files)
└── agents/ # Subagent definitions
At Runtime.New(), the runtime optionally loads AGENTS.md (with @include support) and appends it under a ## Memory header to the system prompt. Project rules are loaded from .agents/rules and consumed during initialization (they are not stored on the Runtime).
| Responsibility | Description |
|---|---|
| Runtime struct | Slim orchestrator with ~7 fields |
| Agent loop | Inlined from pkg/agent, drives model/tool iteration |
| Compaction | Prompt compression with tool I/O stripping, <=200 lines |
| System prompt | Assembled from rules, skills, session context |
| Request handling | Run(), RunStream(), request normalization |
| History management | Session-keyed in-memory history store |
| Lifecycle | New(), Close() with sync.Once idempotency |
| Responsibility | Description |
|---|---|
| Model interface | Single Complete/CompleteStream interface |
| AnthropicProvider | Anthropic SDK wrapper, streaming, token counting |
| OpenAIProvider | OpenAI SDK wrapper, chat completions + responses API |
| Types | Message, Request, Response, StreamResult, ToolCall, ToolDefinition, Usage, ContentBlock |
| Provider factory | NewAnthropicProvider(), NewOpenAI() constructors |
| Responsibility | Description |
|---|---|
| Tool interface | Name(), Description(), Schema(), Execute() |
| Registry | Thread-safe tool registration, lookup, listing |
| Executor | JSON Schema validation, tool dispatch, error wrapping |
| Validator | Parameter validation against JSON Schema |
| MCP tools | MCP client session management, tool proxy |
| Built-in tools | bash, read, write, edit, glob, grep, skill |
Deleted tools: todo_write (dead code), task/taskcreate/taskget/tasklist/taskupdate/killtask (tasks removed), slashcommand (commands removed), askuserquestion (YOLO mode), webfetch/websearch (not core), bash_output/bash_status (not core).
| Responsibility | Description |
|---|---|
| Middleware interface | 4-stage interception: BeforeAgent, BeforeTool, AfterTool, AfterAgent |
| Chain | Sequential execution with short-circuit semantics |
| Funcs helper | Function-pointer adapter for quick middleware creation |
| State | Shared mutable state across middleware invocations |
| TraceMiddleware | JSONL + HTML trace viewer (OTel integration) |
| HTTP trace | HTTP-specific trace middleware variant |
| Responsibility | Description |
|---|---|
| EventType | 7 event type constants |
| Event | Lightweight event struct with type, session ID, payload |
| Payload types | ToolUsePayload, ToolResultPayload, SessionPayload, StopPayload, SubagentPayload |
| Executor | Shell hook spawner with JSON stdin, exit code semantics |
| Selector | Regex-based hook matching by tool name and payload pattern |
| ShellHook | Hook definition: event, command, timeout, async, once |
Merged from: pkg/core/events (types + payloads) + pkg/core/hooks (executor + selector).
Also merged: pkg/core/middleware (24 lines, the Handler/Middleware/Chain types for hook middleware) becomes internal to pkg/hooks.
| Responsibility | Description |
|---|---|
| Settings types | Settings, PermissionsConfig, HooksConfig, MCPConfig, SandboxConfig |
| SettingsLoader | Layer-based settings merge (defaults < project < local < runtime) |
| RulesLoader | .agents/rules project rules loading |
| Hot-reload | fsnotify-based configuration watching |
| FS abstraction | Testable filesystem operations |
| Responsibility | Description |
|---|---|
| History | Thread-safe in-memory message store |
| Message | Role + content + tool calls + metadata |
| TokenCounter | Naive token estimation |
| Clone utilities | Deep copy for isolation |
| Responsibility | Description |
|---|---|
| Manager | Filesystem path whitelist, symlink resolution |
| Network isolation | Allow-list for outbound connections |
Note: Command validation (safety hook) moves to pkg/hooks/safety.go, not sandbox. Sandbox is purely for path/network isolation. The pkg/security/ package is deleted — its Validator logic is extracted into the Go-native DefaultSafetyHook function in pkg/hooks/.
| Responsibility | Description |
|---|---|
| ClientSession | MCP protocol client |
| Transport | stdio and SSE transport builders |
| Tool proxy | Convert MCP tools to tool.Tool interface |
| Responsibility | Description |
|---|---|
| Registry | Skill definition storage and lookup |
| Loader | Load skills from .agents/skills/ directory |
| Matcher | Pattern matching for skill activation |
| Prompt templates | Skill-related prompt template rendering (absorbed from pkg/prompts) |
| Responsibility | Description |
|---|---|
| Manager | Subagent registration and dispatch |
| Definition | Subagent type definitions (general-purpose, explore, plan) |
| Context | Subagent execution context with tool whitelist |
| Result | Subagent output + summary formatting |
type Options struct {
// Required
ModelFactory model.Model
ProjectRoot string
// Core behavior
MaxIterations int
Timeout time.Duration
MaxSessions int
// Compaction
Compact CompactConfig
// Extension points
Middleware []middleware.Middleware
SafetyHook SafetyHook // default: DefaultSafetyHook
Hooks []hooks.ShellHook
// Configuration
SettingsOverrides *config.Settings
// Optional features
MCP *config.MCPConfig
Sandbox *SandboxOptions
Skills []SkillRegistration
Subagents []SubagentRegistration
CustomTools []tool.Tool
// Metadata
EntryPoint EntryPoint
}ACP (pkg/acp) and tasks (pkg/runtime/tasks) are deleted in v2 core. The repository does not carry a contrib/ module for them.
- Context:
pkg/agent(233 lines) definesagent.Modeland the agent loop.pkg/apiwraps it with aconversationModelbridge adapter to convert betweenmodel.Modelandagent.Model. - Options considered: (A) Keep separate packages, simplify bridge. (B) Merge
pkg/agentintopkg/api. - Decision: Option B -- merge into
pkg/api. - Rationale: The agent loop is tightly coupled to Runtime's history management, compaction, and hook dispatch. Separating them requires a bridge adapter that adds complexity without modularity benefit. The agent loop is ~120 lines of logic -- not large enough to justify its own package.
- Consequences:
pkg/agentpackage is deleted. Theagent.Model,agent.Context,agent.ToolCall,agent.ToolResult,agent.ModelOutputtypes are all deleted. Any code referencing these types must be updated.
- Context: v1 compaction calls an LLM to summarize dropped messages (450 lines including retry, fallback model, rollout writer).
- Options considered: (A) Pure strip: drop oldest messages, no summarization. (B) Prompt compression: summarize old content using the model. (C) Keep v1 approach with fallback/retry/rollout.
- Decision: Option B -- prompt compression, but strip tool I/O from the compressed portion and keep the preserved tail verbatim.
- Rationale: Tool outputs are high-token noise; removing them from compression input improves summary signal. Pure strip loses intent. Keeping v1 fallback/retry/rollout complexity is not justified in core.
- Consequences: No
SummaryModel/FallbackModelfields; no retry/fallback/rollout machinery in core compaction. Compaction behavior is testable with a stub model.
- Context: v1 has 6 stages: BeforeAgent, BeforeModel, AfterModel, BeforeTool, AfterTool, AfterAgent. BeforeModel/AfterModel add per-iteration overhead but model calls are deterministic functions of the current history.
- Options considered: (A) Keep 6 stages. (B) Remove BeforeModel/AfterModel, keep 4 (BeforeAgent, BeforeTool, AfterTool, AfterAgent).
- Decision: Option B -- 4 stages.
- Rationale: BeforeModel/AfterModel fire every iteration of the agent loop. In practice, the only consumer was trace middleware which can achieve the same observability via
BeforeAgent(request boundary) andAfterAgent(response boundary) plusState.ModelInput/State.ModelOutputfields. Removing per-iteration middleware overhead also improves latency for multi-iteration runs. - Consequences: Trace middleware uses
BeforeAgent/AfterAgentfor request/response-level tracing. Model-level detail (request, response, usage) is available viamiddleware.Statefields populated by the agent loop. Two fewer methods on the interface means less boilerplate for every middleware implementation.
- Context: v1 has a multi-layer permission system:
PermissionsConfigwith allow/deny/ask rules,ApprovalRecordpersistence,PermissionRequestHandlercallbacks, andsecurity.Resolverresolution logic. - Options considered: (A) Keep full permission system. (B) YOLO default with safety hook. (C) Remove all security.
- Decision: Option B -- YOLO default with Go-native safety hook.
- Rationale: The permission system adds ~800 lines of code for a feature most SDK users don't use (they have their own security layers). A simple Go function that blocks
rm -rf /andsudocovers the catastrophic-mistake case with <1ms overhead and zero configuration. - Consequences:
security.ApprovalRecord,security.Resolver,api.PermissionRequestHandlerare removed from core. Permission-based security can be re-added viaPreToolUsehooks. ThePermissionsfield inSettingsstill exists (for settings file compat) but onlydenyrules are enforced by default.
- Context: v1 has
pkg/core/events(types + payloads, 549 lines) andpkg/core/hooks(executor, 732 lines) as separate packages. The hooks package imports events. No other package imports events without also importing hooks. - Options considered: (A) Keep separate. (B) Merge into single
pkg/hookspackage. - Decision: Option B -- merge into
pkg/hooks. - Rationale: Events and hooks are a single concern: "lifecycle notifications with optional shell command execution." Separating them creates an artificial boundary that requires cross-package imports without enabling independent use. Merging reduces the package count by 2 (events + core container).
- Consequences: Import paths change from
pkg/core/eventsandpkg/core/hookstopkg/hooks. Thepkg/core/directory is deleted entirely (its only other content,pkg/core/middlewareat 24 lines, merges intopkg/middleware).
- Context:
pkg/securitycontainsValidator(command validation),ApprovalRecord,PermissionMatcher,Resolver. With YOLO default, onlyValidatorsurvives. - Options considered: (A) Keep
pkg/securitywith justValidator. (B) MergeValidatorintopkg/sandbox. (C) MergeValidatorintopkg/hooksas theDefaultSafetyHook. - Decision: Option C -- merge into
pkg/hooks. - Rationale: The
Validatoris functionally aPreToolUsehook that blocks dangerous bash commands. This is exactly what the safety hook does. Placing it inpkg/hooks/safety.gomakes the relationship explicit: safety validation is a built-in hook, not a separate security layer.pkg/sandboxremains focused on path/network isolation (orthogonal concern). - Consequences:
pkg/security/is deleted entirely.hooks.DefaultSafetyHookabsorbs the validator patterns. Import paths update accordingly. Approval/permission matcher code is deleted (YOLO default).
- Requirement-driven: Each test traces to a PRD story or acceptance criterion.
- No real API calls: All tests use mock/stub Model implementations.
- No v1 test porting: Tests are written from scratch against v2 interfaces.
- Table-driven: Multiple scenarios in a single test function where applicable.
| Priority | Test Area | What to Test | Traces To |
|---|---|---|---|
| P0 | Runtime lifecycle | New() with minimal options succeeds; Close() is idempotent; double-close returns same error |
Story 2, A2 |
| P0 | Single-prompt run | Run() with stub model returns expected output; history contains user + assistant messages |
Story 2, A14 |
| P0 | Streaming run | RunStream() emits deltas and final event; channel closes after completion |
Story 2 |
| P0 | Tool execution | Model returns tool call; tool executes; result fed back to model; model produces final output | Story 7, A9 |
| P0 | Compaction trigger | History exceeds threshold; compaction invokes prompt compression; preserved tail kept verbatim | Story 3, A4, A5 |
| P0 | Compaction strips tool I/O | Compression input excludes tool-call/tool-result content | Story 3, A4 |
| P0 | Middleware chain | 4-stage execution in correct order; error in BeforeAgent short-circuits | Story 6, A8 |
| P0 | Event dispatch | PreToolUse/PostToolUse hooks fire with correct payloads | Story 5, A7 |
| P0 | Context cancellation | Cancelled context stops agent loop; returns context error | General |
| P0 | Safety hook | rm -rf / blocked; ls allowed; custom hook overrides default |
FR-9 |
| P1 | Max iterations | Loop stops at MaxIterations; returns ErrMaxIterations | General |
| P1 | Session isolation | Two sessions have independent histories | General |
| P1 | MCP tool registration | MCP tools appear in registry; execute correctly | General |
| P1 | Config hot-reload | Settings change triggers tool registry update | General |
type stubModel struct {
responses []model.Response
index int
}
func (m *stubModel) Complete(ctx context.Context, req model.Request) (*model.Response, error) {
if m.index >= len(m.responses) {
return &model.Response{Message: model.Message{Content: "done"}}, nil
}
resp := m.responses[m.index]
m.index++
return &resp, nil
}
func (m *stubModel) CompleteStream(ctx context.Context, req model.Request, cb model.StreamHandler) error {
resp, err := m.Complete(ctx, req)
if err != nil {
return err
}
return cb(model.StreamResult{Final: true, Response: resp})
}Tests are co-located with implementation:
pkg/api/agent_test.go-- Runtime lifecycle, run, streamingpkg/api/compact_test.go-- Compaction logicpkg/middleware/chain_test.go-- Middleware chain executionpkg/hooks/executor_test.go-- Hook execution, event dispatchpkg/tool/registry_test.go-- Tool registration, lookuppkg/model/anthropic_test.go-- Anthropic providerpkg/model/openai_test.go-- OpenAI provider
# All tests pass
go test ./pkg/...
# No v1 references
grep -r "agent.Model" pkg/ # expect 0 matches
grep -r "agent.Context" pkg/ # expect 0 matches
# Package count
find pkg -type d | wc -l # expect <= 11
# Line count
find pkg -name '*.go' ! -name '*_test.go' | xargs wc -l | tail -1 # expect <= 20000
# Build
go build ./...Step 1: Create v2 branch
Step 2: Merge agent.Model into model.Model
- Delete pkg/agent/
- Inline agent loop into pkg/api/agent.go
- Delete conversationModel bridge adapter
- Update all references
Step 3: Slim Runtime struct
- Remove 10 fields
- Simplify New() constructor
Step 4: Replace compaction
- Rewrite compact.go (prompt compression, <=200 lines)
- Strip tool I/O from compression input
- No fallback/retry/rollout writer
- Keep toolTransactionSpans() for tail boundary safety
Step 5: Merge packages
- core/events + core/hooks → pkg/hooks
- core/middleware → pkg/middleware
- Delete pkg/core/ entirely
Step 6: Reduce events 16 → 7
Step 7: Reduce middleware 6 → 4 stages
Gate: go build ./... passes
Step 8: Delete todo_write tool
Step 9: Delete task tools + `pkg/runtime/tasks/`
Step 10: Delete `pkg/acp/` and remove CLI ACP mode
Step 11: Remove slashcommand, askuserquestion, webfetch, websearch, bash_output, bash_status from core tools
Step 12: Delete pkg/runtime/commands/ (slash commands removed)
Step 13: Merge prompts: skill templates → runtime/skills
Step 14: Absorb security → hooks/safety.go
Step 15: Implement safety hook
Gate: go build ./... passes
Step 16: Rewrite core tests (Runtime, run, streaming, tools, compaction)
Step 17: Rewrite middleware/hooks tests
Step 18: Update all 12 examples
Step 19: Verify line count and package count targets
Step 20: Final go test ./pkg/... pass
Gate: All acceptance criteria from PRD pass
- Start with Step 2 (Model merge). Everything else depends on eliminating the dual interface. Do not attempt parallel execution of Phase 1 steps -- they are sequential.
- Reuse
toolTransactionSpans()frompkg/api/compact.golines 399-436. It is correct, well-tested, and ~40 lines. Copy it into the new compact.go. - The agent loop is ~120 lines of actual logic (the rest of
pkg/agent/agent.gois type definitions and constructor). When inlining, keep the loop structure but replacea.model.Generate()withr.model.CompleteStream()and manage history inline. - Do not create a new
agent.Contextequivalent. Usemiddleware.Statedirectly. The agent context was thin glue that added no value. - Import path changes are mechanical. After merging
pkg/core/*, find-and-replacecoreevents "github.com/stellarlinkco/agentsdk-go/pkg/core/events"with"github.com/stellarlinkco/agentsdk-go/pkg/hooks"across all files.
Enforce these rules mechanically (CI checks):
grep -r "pkg/agent" pkg/returns 0 matches (after Phase 1 Step 2)rg -n "\\bpkg/acp\\b" pkg cmd -Sreturns 0 matches (after Phase 2)rg -n "\\bpkg/runtime/tasks\\b" pkg cmd -Sreturns 0 matches (after Phase 2)find pkg -type d | wc -l<= target (after Phase 2)- No
BeforeModelorAfterModelreferences inpkg/middleware/types.go - Exactly 7
EventTypeconstants inpkg/hooks/ go build ./...passes (gate between phases)
- Most important test: Runtime + stub model + tool call + compaction. This single end-to-end path covers 60%+ of the codebase.
- Compaction test must include: A stub model that records the compression request and asserts tool I/O is excluded from compression input.
- Middleware test: Verify 4 stages fire in order: BeforeAgent -> BeforeTool -> AfterTool -> AfterAgent. Verify error in BeforeAgent prevents tool execution.
- OQ1: Exact naming for the merged Model interface methods --
Complete/CompleteStreamis the current choice and should be kept. - OQ2: Whether
askuserquestionshould remain in core -- currently removed per PRD assumption A3. pkg/securityabsorption: Resolved --security.Validatorpatterns move topkg/hooks/safety.goasDefaultSafetyHook. No circular dependency risk since hooks is a leaf package.
- Risk:
pkg/runtime/*sub-packages push directory count above 11. Mitigation: The PRD counts "packages" not "directories".runtime/skillsandruntime/subagentsare logically one group (commands is deleted). If the count is strict, flatten topkg/skills,pkg/subagents. - Risk: Prompt compression may change summary wording across runs. Mitigation: Keep preserved tail verbatim; bound summary size; make compaction testable with a stub model.
- Risk: Removing
BeforeModel/AfterModelbreaks trace middleware that observed model requests/responses. Mitigation: Trace middleware usesBeforeAgent(sees full request) andAfterAgent(sees full response including usage). Model-level detail is available viaState.ModelInput/State.ModelOutputfields populated by the agent loop.
This architecture document is optimized for autonomous agent consumption. Every convention is mechanical. Every decision traces to a PRD requirement.