feat: Complete web_search Tavily API integration#53
Merged
Conversation
Update cost estimation baselines to clarify: - All costs are in USD cents (not abstract credits) - Minimum goal budget: $50-100 USD per goal - Per-task budgets: $25-50 minimum with inference overhead - Explicit examples of realistic plan costs This fixes the issue where planner was returning $2.00 budgets when SOUL.md and GOVERNANCE.md specify $50-100 per goal. Model will now allocate costs aligned with documented budget policy. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
PROBLEM: Agent stuck in indecision loop during orchestrator planning phase. When orchestrator entered "classifying/planning" phase, agent received vague instruction to "do solo work" but had no clear guidance. This caused 87+ repetitive thought chunks looping on: "Should I wait or do solo work?" ROOT CAUSE: System prompt had ambiguous guidance. Agent's genesis prompt included the active goal, so agent kept thinking about a goal with $0.00 budget while orchestrator planned it. No state change occurred, causing infinite loop of identical reasoning before model gave up with malformed tool calls. SOLUTION: Added getPhaseSpecificGuidance() function that: 1. Detects when orchestrator is in classifying/planning/plan_review phases 2. Injects explicit "IMMEDIATE SOLO WORK DIRECTIVE" into system prompt 3. Provides 5 concrete, different actions (WORKLOG update, research, etc.) 4. Explicitly forbids agent from thinking about the active goal VERIFICATION: After fix deployment, agent reduced from 80+ repetitive thought chunks to 11 analytical chunks. Agent moved through executing phase without getting stuck, delegating work to children instead of looping. - Added getPhaseSpecificGuidance(db) function with detailed comments - Integrated into buildSystemPrompt() flow after orchestrator status injection - Guidance only activates during busy orchestrator phases (no impact on other phases) - No tool changes, no behavior changes to agent decision logic, purely guidance injection Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
PROBLEM: Worker getting 400 "invalid function arguments json string" error
from MiniMax when sending reconstructed messages with tool calls.
ROOT CAUSE: Double-encoding bug in buildContextMessages() on line 183.
When tool call arguments come from MiniMax inference response, they are
already a JSON string. Calling JSON.stringify() on a string value results
in double-encoded JSON:
Before: {"key":"value"}
After: "{\"key\":\"value\"}"
When we send this back to MiniMax as a tool call argument, it rejects it
as malformed JSON because the value is a string literal, not an object.
SOLUTION: Type-check tc.arguments before stringifying.
If already a string, pass through as-is. Only stringify if it's an object.
This complements the thought-loop fix by ensuring that even if the model
gets confused and generates edge-case scenarios, tool arguments are properly
formatted when reconstructing the message history.
- Added conditional: `typeof tc.arguments === "string" ? tc.arguments : JSON.stringify(...)`
- Prevents double-encoding while maintaining backward compatibility with object arguments
- Minimal change, zero side effects
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…nance policy Root cause: Hardcoded fallback values in orchestrator.ts used estimatedCostCents: 200 (cents, = $2.00) when the planner failed or returned no tasks. This fallback was not updated when cost estimation policy changed to require $50-100 minimum budgets. Solution: Update both fallback cases (planner error + empty tasks) to use estimatedCostCents: 5000 ($50 minimum), aligning with the $50-100 USD budget policy specified in SOUL.md and GOVERNANCE.md. This ensures that even when the planner fails, goals created fall back to realistic budgets rather than the old $2 placeholder. Fixes agent seeing 'goal created with $2 budget' messages despite cost estimation policy fix in ff7da4e. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Documents the minimum budget requirement for all goals and specifies how budgets are determined in both planner success and fallback paths. Includes reference to commit 2d28a3b which fixed fallback values. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ning - Implement Tavily REST API integration for web_search tool - Add HTTP mocking with vi.mock() for offline deterministic tests - Implement search_type parameter mapping (all/news/research/code) - Add defensive URL parsing with try-catch error handling - Add response validation for Tavily API structure - Use dependency injection (ctx.config) for test isolation - Update all test cases to validate real search results - Add integration test coverage for tool registration This completes the web_search tool from stub (empty results) to production-ready Tavily-powered search with comprehensive error handling and test coverage. Code review approved. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes the web_search tool implementation by wiring it to the Tavily Search API, replacing the previous stub that returned empty results.
Key Changes:
Architecture
ctx.config.discovery.tavilyApiKey(injected at runtime) instead of loadConfig() for testabilityTesting
Deployment
Fixes #52 (API discovery tool) - web_search now functional
🤖 Generated with Claude Code