Skip to content

Commit cce0d15

Browse files
authored
More tool improvements and addition of eval CLI (#90)
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for complex web interactions including shadow DOM and nested iframe navigation * Introduced intelligent caching for faster action execution and search patterns * Enhanced page analysis with frame-aware element identification and accessibility mapping * Added schema-based data extraction with JavaScript selector caching * **Improvements** * Better handling of multi-frame and shadow DOM environments * Optimized performance through pattern caching and reuse * Enhanced environment compatibility for broader deployment contexts <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent e06cb81 commit cce0d15

117 files changed

Lines changed: 29540 additions & 2266 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# API Keys for Eval Runner
2+
# Copy this file to .env and fill in your keys
3+
4+
# Agent LLM providers
5+
CEREBRAS_API_KEY=your-cerebras-api-key
6+
OPENAI_API_KEY=your-openai-api-key
7+
ANTHROPIC_API_KEY=your-anthropic-api-key
8+
9+
# Optional: Braintrust for experiment tracking
10+
BRAINTRUST_API_KEY=your-braintrust-api-key

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
.DS_Store
2+
.env
23
.git_cl_description_backup
34
*.ctc.json
45
*.Makefile
@@ -59,4 +60,6 @@ test/perf/.generated
5960

6061
# Dependencies
6162
node_modules/
62-
**/.idea/
63+
**/.idea/
64+
node_modules/**
65+
eval-logs/**

CLAUDE.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
**Browser Operator** is an AI-native browser built on Chrome DevTools frontend. It adds a multi-agent AI framework to the DevTools panel, enabling intelligent automation and web interaction through specialized AI agents.
8+
9+
## Build & Development Commands
10+
11+
### Initial Setup
12+
13+
```bash
14+
# Prerequisites: depot_tools in PATH (https://chromium.googlesource.com/chromium/tools/depot_tools.git)
15+
gclient sync
16+
npm install
17+
cp .env.example .env # Configure API keys
18+
```
19+
20+
### Build
21+
22+
```bash
23+
npm run build # Standard build (runs gn gen automatically)
24+
npm run build -- --watch # Watch mode for development
25+
npm run build -- -t Debug # Build to out/Debug instead of out/Default
26+
27+
# Fast build (skip type checking and bundling)
28+
gn gen out/fast-build --args="devtools_skip_typecheck=true devtools_bundle=false"
29+
npm run build -- -t fast-build
30+
```
31+
32+
### Running DevTools with Custom Build
33+
34+
```bash
35+
# Terminal 1: Build with watch
36+
npm run build -- --watch
37+
38+
# Terminal 2: Serve the built files
39+
cd out/Default/gen/front_end && python3 -m http.server 9000
40+
41+
# Terminal 3: Launch Browser Operator with custom DevTools
42+
/Applications/Browser\ Operator.app/Contents/MacOS/Browser\ Operator \
43+
--disable-infobars \
44+
--custom-devtools-frontend=http://localhost:9000/ \
45+
--remote-debugging-port=9222
46+
```
47+
48+
### Testing
49+
50+
```bash
51+
npm run test # Unit tests (Karma/Mocha)
52+
npm run webtest # E2E tests (Puppeteer)
53+
npm run debug-webtest -- --spec=path/to/test # Debug specific test
54+
npm run lint # ESLint
55+
```
56+
57+
### Eval Runner (Agent Testing)
58+
59+
**Recommended: Use the eval-runner-analyst agent** to run evals and get detailed analysis:
60+
61+
```
62+
# In Claude Code, use the Task tool with eval-runner-analyst agent:
63+
"Run the action agent evals with cerebras gpt-oss-120b"
64+
"Test action-agent-checkbox-001 and action-agent-form-001"
65+
"Compare V0 and V1 action agents on iframe tests"
66+
```
67+
68+
The eval-runner-analyst agent handles the complete workflow: running tests, collecting results, and providing detailed analysis of pass/fail patterns.
69+
70+
**Manual CLI usage** (if needed):
71+
72+
The eval runner automatically loads environment variables from `.env` in the project root.
73+
74+
```bash
75+
# Run agent evaluations (launches headless Chrome by default)
76+
npx tsx scripts/eval-runner/cli.ts --tool action_agent --verbose
77+
npx tsx scripts/eval-runner/cli.ts --test action-agent-click-001 --verbose
78+
79+
# Use Cerebras for fast inference (preferred models: zai-glm-4.6, gpt-oss-120b)
80+
npx tsx scripts/eval-runner/cli.ts --provider cerebras --model zai-glm-4.6 --tool action_agent
81+
npx tsx scripts/eval-runner/cli.ts --provider cerebras --model gpt-oss-120b --tool action_agent
82+
83+
# Run V0 agent variant
84+
npx tsx scripts/eval-runner/cli.ts --tool action_agent --tool-override action_agent_v0 --provider cerebras --model gpt-oss-120b
85+
86+
# Connect to running Browser Operator (bypasses bot detection, uses authenticated sessions)
87+
npx tsx scripts/eval-runner/cli.ts --tool action_agent --remote-debugging-port 9222 --verbose
88+
89+
# Run with visible browser
90+
npx tsx scripts/eval-runner/cli.ts --tool action_agent --no-headless
91+
```
92+
93+
**Note:** The LLM judge defaults to OpenAI (`gpt-4o`) regardless of agent provider. Override with `--judge-provider` and `--judge-model`.
94+
95+
## Architecture
96+
97+
### DevTools Module Hierarchy
98+
99+
```
100+
front_end/
101+
├── core/ # Shared utilities, CDP backend integration
102+
├── models/ # Business logic, data handling
103+
├── panels/ # High-level panels (one per DevTools tab)
104+
├── ui/components/ # Reusable UI components
105+
└── entrypoints/ # Application entrypoints (devtools_app.ts)
106+
```
107+
108+
Visibility rules: `core/``models/``panels/``entrypoints/` (enforced by GN build)
109+
110+
### AI Chat Panel (`front_end/panels/ai_chat/`)
111+
112+
```
113+
ai_chat/
114+
├── agent_framework/ # Agent execution engine
115+
│ ├── AgentRunner.ts # LLM loop, tool execution, handoffs
116+
│ ├── ConfigurableAgentTool.ts # Agent definition via config objects
117+
│ └── implementation/ # Concrete agent configs (ActionAgent, etc.)
118+
├── LLM/ # Provider integrations
119+
│ ├── LLMClient.ts # Client facade
120+
│ ├── LLMProviderRegistry.ts # Provider management
121+
│ └── *Provider.ts # OpenAI, Cerebras, Anthropic, Groq, etc.
122+
├── cdp/ # Chrome DevTools Protocol adapters
123+
│ ├── CDPSessionAdapter.ts # Abstract CDP interface
124+
│ ├── DirectCDPAdapter.ts # Direct CDP connection (eval runner)
125+
│ └── SDKTargetAdapter.ts # DevTools SDK integration
126+
├── tools/ # Agent tools (~30 tools for browser actions)
127+
├── dom/ # Element resolution (shadow DOM, iframes)
128+
├── common/ # Shared utilities (geometry, mouse, xpath)
129+
├── core/ # Orchestration, LLMConfigurationManager
130+
├── evaluation/ # Test case definitions
131+
└── ui/ # Chat panel UI components
132+
```
133+
134+
### Key Concepts
135+
136+
**Agent Framework**
137+
- `ConfigurableAgentTool`: Agents defined via config (name, prompt, tools, schema, handoffs)
138+
- `AgentRunner`: Executes agent loop - LLM calls, tool execution, agent handoffs
139+
- `ToolRegistry`: Central registry for tools/agents (`ToolRegistry.registerToolFactory()`)
140+
- Handoffs: Agents transfer to specialists via LLM tool calls or max iterations
141+
142+
**CDP Adapters** - Abstraction layer for Chrome DevTools Protocol:
143+
- `SDKTargetAdapter`: Used when running inside DevTools (has SDK access)
144+
- `DirectCDPAdapter`: Used by eval runner (connects via chrome-remote-interface)
145+
- Both implement `CDPSessionAdapter` interface with `getAgent(domain)` method
146+
147+
**LLM Configuration** (via `LLMConfigurationManager`):
148+
- 3-tier models: Main (powerful), Mini (fast), Nano (simple tasks)
149+
- Override system: Per-request overrides for eval without affecting localStorage
150+
- Providers: openai, cerebras, anthropic, groq, openrouter, litellm
151+
152+
### Adding a New Agent
153+
154+
```typescript
155+
// In implementation/ConfiguredAgents.ts
156+
function createMyAgentConfig(): AgentToolConfig {
157+
return {
158+
name: 'my_agent',
159+
description: 'What this agent does',
160+
systemPrompt: 'Instructions for agent behavior',
161+
tools: ['navigate_url', 'perform_action'], // Registered tool names
162+
schema: { /* JSON schema for input */ },
163+
handoffs: [{ targetAgentName: 'specialist_agent', trigger: 'llm_tool_call' }],
164+
maxIterations: 10,
165+
};
166+
}
167+
168+
// Register in initializeConfiguredAgents()
169+
const myAgent = new ConfigurableAgentTool(createMyAgentConfig());
170+
ToolRegistry.registerToolFactory('my_agent', () => myAgent);
171+
```
172+
173+
### Adding a New Tool
174+
175+
Tools implement the `Tool` interface with `name`, `description`, `schema`, and `execute()`. Register via `ToolRegistry.registerToolFactory()`.
176+
177+
### Eval Runner Architecture
178+
179+
```
180+
scripts/eval-runner/
181+
├── cli.ts # CLI entry point
182+
├── TestRunner.ts # Test orchestration
183+
├── BrowserExecutor.ts # Puppeteer/CDP automation
184+
├── AgentBridge.ts # Connects runner to agent tools
185+
├── LLMJudge.ts # LLM-based evaluation scoring
186+
└── reporters/ # Console, JSON, Markdown output
187+
```
188+
189+
Test cases defined in `front_end/panels/ai_chat/evaluation/test-cases/`.
190+
191+
## Environment Variables
192+
193+
```bash
194+
OPENAI_API_KEY=... # OpenAI
195+
CEREBRAS_API_KEY=... # Cerebras (fast inference)
196+
ANTHROPIC_API_KEY=... # Anthropic
197+
BRAINTRUST_API_KEY=... # Experiment tracking (optional)
198+
```
199+
200+
## Key Patterns
201+
202+
- **Lazy loading**: Features dynamically imported via `*-meta.ts` files
203+
- **GN build system**: Visibility rules enforce module boundaries; edit BUILD.gn when adding files
204+
- **EventBus**: Uses `Common.ObjectWrapper.ObjectWrapper` for DevTools-compatible events
205+
- **Shadow DOM/iframe support**: `EnhancedElementResolver` and `buildBackendIdMaps()` handle composed trees
206+
- **Node ID mapping**: Accessibility tree `nodeId` differs from DOM `backendDOMNodeId`; use mapping utilities

config/gni/devtools_grd_files.gni

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -777,10 +777,44 @@ grd_files_bundled_sources = [
777777
"front_end/panels/ai_chat/tools/mini_app/LaunchMiniAppTool.js",
778778
"front_end/panels/ai_chat/tools/mini_app/ListMiniAppsTool.js",
779779
"front_end/panels/ai_chat/tools/mini_app/UpdateMiniAppStateTool.js",
780+
"front_end/panels/ai_chat/tools/DOMToolsRegistration.js",
781+
"front_end/panels/ai_chat/tools/HybridAccessibilityTreeTool.js",
782+
"front_end/panels/ai_chat/tools/CachedSchemaExtractorTool.js",
783+
"front_end/panels/ai_chat/tools/GetAccessibilityTreeToolV0.js",
784+
"front_end/panels/ai_chat/tools/SearchTool.js",
785+
"front_end/panels/ai_chat/tools/TryCachedActionTool.js",
786+
"front_end/panels/ai_chat/tools/action_cache/ActionPatternCache.js",
787+
"front_end/panels/ai_chat/tools/action_cache/ActionPatternCapture.js",
788+
"front_end/panels/ai_chat/tools/action_cache/types.js",
789+
"front_end/panels/ai_chat/tools/search/SearchPatternCache.js",
790+
"front_end/panels/ai_chat/tools/search/SearchStrategy.js",
791+
"front_end/panels/ai_chat/tools/search/types.js",
792+
"front_end/panels/ai_chat/tools/selector_cache/SelectorCache.js",
793+
"front_end/panels/ai_chat/tools/selector_cache/types.js",
794+
"front_end/panels/ai_chat/a11y/FrameRegistry.js",
795+
"front_end/panels/ai_chat/a11y/HybridSnapshot.js",
796+
"front_end/panels/ai_chat/a11y/HybridSnapshotUniversal.js",
797+
"front_end/panels/ai_chat/dom/ComposedTreeResolver.js",
798+
"front_end/panels/ai_chat/dom/ElementResolver.js",
799+
"front_end/panels/ai_chat/dom/EnhancedElementResolver.js",
800+
"front_end/panels/ai_chat/dom/ShadowPiercer.js",
801+
"front_end/panels/ai_chat/dom/shadow-piercer-runtime.js",
802+
"front_end/panels/ai_chat/dom/index.js",
803+
"front_end/panels/ai_chat/cdp/CDPSessionAdapter.js",
804+
"front_end/panels/ai_chat/cdp/DirectCDPAdapter.js",
805+
"front_end/panels/ai_chat/cdp/FrameRegistryUniversal.js",
806+
"front_end/panels/ai_chat/cdp/SDKTargetAdapter.js",
807+
"front_end/panels/ai_chat/cdp/getAdapter.js",
808+
"front_end/panels/ai_chat/cdp/index.js",
780809
"front_end/panels/ai_chat/common/utils.js",
810+
"front_end/panels/ai_chat/common/utils-universal.js",
811+
"front_end/panels/ai_chat/common/xpath-builder.js",
812+
"front_end/panels/ai_chat/common/geometry-helpers.js",
813+
"front_end/panels/ai_chat/common/mouse-helpers.js",
781814
"front_end/panels/ai_chat/common/log.js",
782815
"front_end/panels/ai_chat/common/context.js",
783816
"front_end/panels/ai_chat/common/page.js",
817+
"front_end/panels/ai_chat/common/accessibility-tree-search.js",
784818
"front_end/panels/ai_chat/mini_apps/GenericMiniAppBridge.js",
785819
"front_end/panels/ai_chat/mini_apps/MiniAppEventBus.js",
786820
"front_end/panels/ai_chat/mini_apps/MiniAppInitialization.js",
@@ -817,6 +851,7 @@ grd_files_bundled_sources = [
817851
"front_end/panels/ai_chat/agent_framework/AgentRunnerEventBus.js",
818852
"front_end/panels/ai_chat/agent_framework/AgentSessionTypes.js",
819853
"front_end/panels/ai_chat/agent_framework/ConfigurableAgentTool.js",
854+
"front_end/panels/ai_chat/agent_framework/RuntimeContext.js",
820855
"front_end/panels/ai_chat/agent_framework/implementation/ConfiguredAgents.js",
821856
"front_end/panels/ai_chat/agent_framework/implementation/agents/ActionAgent.js",
822857
"front_end/panels/ai_chat/agent_framework/implementation/agents/ActionVerificationAgent.js",
@@ -832,6 +867,8 @@ grd_files_bundled_sources = [
832867
"front_end/panels/ai_chat/agent_framework/implementation/agents/ScrollActionAgent.js",
833868
"front_end/panels/ai_chat/agent_framework/implementation/agents/WebTaskAgent.js",
834869
"front_end/panels/ai_chat/agent_framework/implementation/agents/SearchAgent.js",
870+
"front_end/panels/ai_chat/agent_framework/implementation/agents/ActionAgentV0.js",
871+
"front_end/panels/ai_chat/agent_framework/implementation/agents/ActionAgentV2.js",
835872
"front_end/panels/ai_chat/common/MarkdownViewerUtil.js",
836873
"front_end/panels/ai_chat/evaluation/runner/VisionAgentEvaluationRunner.js",
837874
"front_end/panels/ai_chat/evaluation/runner/EvaluationRunner.js",
@@ -840,11 +877,17 @@ grd_files_bundled_sources = [
840877
"front_end/panels/ai_chat/evaluation/framework/MarkdownReportGenerator.js",
841878
"front_end/panels/ai_chat/evaluation/framework/types.js",
842879
"front_end/panels/ai_chat/evaluation/test-cases/action-agent-tests.js",
880+
"front_end/panels/ai_chat/evaluation/test-cases/action-agent-shadow-dom-tests.js",
881+
"front_end/panels/ai_chat/evaluation/test-cases/action-agent-iframe-tests.js",
882+
"front_end/panels/ai_chat/evaluation/test-cases/cdp-tool-tests.js",
843883
"front_end/panels/ai_chat/evaluation/test-cases/html-to-markdown-tests.js",
884+
"front_end/panels/ai_chat/evaluation/test-cases/index.js",
844885
"front_end/panels/ai_chat/evaluation/test-cases/research-agent-tests.js",
845886
"front_end/panels/ai_chat/evaluation/test-cases/schema-extractor-tests.js",
846887
"front_end/panels/ai_chat/evaluation/test-cases/streamlined-schema-extractor-tests.js",
847888
"front_end/panels/ai_chat/evaluation/test-cases/web-task-agent-tests.js",
889+
"front_end/panels/ai_chat/evaluation/test-cases/web-task-agent-shadow-dom-tests.js",
890+
"front_end/panels/ai_chat/evaluation/test-cases/web-task-agent-iframe-tests.js",
848891
"front_end/panels/ai_chat/evaluation/utils/ErrorHandlingUtils.js",
849892
"front_end/panels/ai_chat/evaluation/utils/EvaluationTypes.js",
850893
"front_end/panels/ai_chat/evaluation/utils/PromptTemplates.js",

0 commit comments

Comments
 (0)