BeeOS integration follow-ups and rollout lifecycle hardening by wunitb · Pull Request #31 · iii-hq/agentos

wunitb · 2026-04-03T22:16:47Z

Summary

add BeeOS-facing runtime control and compatibility follow-ups across workflow, swarm, TUI, and core AgentOS seams
extend self-improvement rollout lifecycle states, restore rollout regressions, and normalize iii cron trigger handling
clean TypeScript contract drift for the BeeOS runtime path and refresh targeted regression coverage

Test Plan

npx tsc -p tsconfig.json --noEmit --pretty false
npm test -- src/tests/mcp-client.test.ts src/tests/security-headers.test.ts src/tests/skillkit-bridge.test.ts src/tests/streaming.test.ts src/tests/llm-router.test.ts src/tests/session-lifecycle.test.ts src/tests/cron.test.ts src/tests/api.test.ts

Summary by CodeRabbit

Release Notes

New Features
- Added API key authentication support via environment variables
- Added workflow and swarm cancellation operations with tracking
- Introduced feedback decision history retrieval
- Extended rollout pipeline with intermediate deployment phases (shadow, canary)
- Enhanced evaluation suites with baseline function comparison support
Improvements
- Improved error logging detail for chat operations
- Enhanced type safety across trigger operations
Tests
- Expanded test coverage for cron scheduling, evaluations, feedback decisions, and workflow controls
Chores
- Updated observability configuration settings

- tui: add AGENTOS_API_KEY auth header to all HTTP requests (fixes "Failed to parse response") - llm-router: max_tokens → max_completion_tokens for GPT-5.4+ models - llm-router: tool.id fallback to tool.function_id (prevents undefined.replace crash) - agent-core: log error detail in "Chat failed" message - shared/logger: dynamic import getContext (not in iii-sdk v0.9.0 public API) - shared/metrics: dynamic import getContext (same issue) - config.yaml: otel exporter stdout → memory (stdout not supported) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AgentOS TUI uses first agent from dashboard stats agentList. When dashboard is unavailable, falls back to agent name. Changed from 'default' to 'brain' which is the BeeOS primary agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-04-03T22:16:53Z

@wunitb is attempting to deploy a commit to the motia Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-04-03T22:19:34Z

📝 Walkthrough

Walkthrough

This PR introduces multi-stage evolution and feedback systems with shadow/canary rollout states, adds runtime controls for swarms and workflows (cancellation, status tracking), normalizes cron expressions to 6-field format, extends the LLM tool interface with optional ID fallbacks, and expands test coverage across core modules.

Changes

Cohort / File(s)	Summary
Configuration & Infrastructure `config.yaml`, `crates/tui/src/main.rs`, `docs/reviews/...`	Updated OTEL exporter from stdout to memory; added Authorization Bearer header from AGENTOS_API_KEY env var to HTTP client; changed default agent fallback from "default" to "brain"; added review documentation for TypeScript schema cleanup.
Type System & Shared Utilities `src/types.ts`, `src/shared/cron.ts`, `src/shared/metrics.ts`	Added EvolveCandidateClass, EvolveRiskLabel, EvolveRolloutHint types; extended EvolveStatus with "shadow" and "canary"; refined EvolvedFunction.metadata structure; added EvalSuite.metadata; expanded ReviewResult decision to include "promote"/"demote" and reasonCode field; introduced normalizeCronExpression for converting 5-field to 6-field cron; widened TriggerVoidFn return type to Promise.
Eval System `src/eval.ts`, `src/__tests__/eval.test.ts`	Added runSuiteAggregate and aggregateFromResults helpers; extended eval::suite to compute baseline aggregates and preserve candidateClass/baselineFunctionId metadata; added test cases for weighted suite aggregation and baseline metadata preservation.
Evolution & Rollout `src/evolve.ts`, `src/__tests__/evolve.test.ts`	Introduced syncRolloutState helper to synchronize fn.metadata.rolloutState with fn.status; applied rollout sync on function generation and state transitions; adjusted vitest global detection; added test coverage for rollout lifecycle, security failures, and sandbox validation with killed status propagation.
Feedback & Review System `src/feedback.ts`, `src/__tests__/feedback.test.ts`	Widened Decision type to include "promote"/"demote"; added promotion readiness inference based on rollout status; extended ReviewResult with reasonCode (enum: no_eval_data, avg_below_threshold, too_many_failures, ready_for_*); added feedback::history function; extended promotion/demotion rollout ladders with intermediate stages; included shadow/canary in reviewable statuses; changed auto_review cron from interval to normalized expression; added comprehensive test coverage for promotion transitions, reasonCode validation, and history filtering.
Swarm & Workflow Runtime `src/swarm.ts`, `src/workflow.ts`, `src/__tests__/swarm.test.ts`, `src/__tests__/workflow.test.ts`	Added SwarmConfig cancellation state and metadata (cancelledAt, cancelReason); introduced swarm::status and swarm::cancel functions with auth gating; augmented workflow runs with results array; added workflow::get_run and workflow::cancel with cancellation checks throughout execution; added test suites verifying swarm metadata, cancellation behavior, and workflow cancellation flow with promise-based blocking.
Cron & Lifecycle Modules `src/cron.ts`, `src/hand-runner.ts`, `src/recovery.ts`, `src/session-lifecycle.ts`, `src/__tests__/cron.test.ts`, `src/__tests__/hand-runner.test.ts`, `src/__tests__/recovery.test.ts`, `src/__tests__/session-lifecycle.test.ts`	Applied normalizeCronExpression to trigger registration across cron, hand-runner, recovery, and session-lifecycle; refactored trigger result typing from explicit declarations to await-cast pattern (as any[]); added StoredReaction type in session-lifecycle; added mockRegisterTrigger across test suites to track cron registration; introduced test coverage for trigger registration with normalized 6-field expressions.
Core Agent & LLM `src/agent-core.ts`, `src/llm-router.ts`, `src/__tests__/mcp-client.test.ts`, `src/__tests__/security-headers.test.ts`, `src/__tests__/skillkit-bridge.test.ts`, `src/__tests__/streaming.test.ts`	Extended agent::chat error logging to include error message; updated CompleteRequest.tools to support optional id/function_id with fallback logic in both Anthropic and OpenAI-compat callers; updated test mocks for trigger signatures to accept optional (fnId, payload) parameters with explicit return types (Promise vs void).
Data & State Management `src/context-cache.ts`, `src/tools-extended.ts`, `src/__tests__/tools-extended.test.ts`	Refactored trigger result typing in context-cache invalidation; applied normalizeCronExpression to tool::cron_create schedule handling; added normalized schedule propagation through state and return values; added test assertions for cron normalization and trigger registration in tool creation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat: dynamic function evolution, eval harness, and feedback loop #15: Overlaps extensively with eval.ts, evolve.ts, feedback.ts, and related types/tests covering the same evolution, aggregation, and rollout lifecycle features.
Feat/agent intelligence orchestration #27: Modifies src/context-cache.ts trigger result handling and test mocks directly overlapping this PR's casting/typing refactoring.
feat: iii-sdk 0.10.0 upgrade + TUI overhaul + Claude Code patterns + zero-config CLI #25: Updates src/shared/metrics.ts TriggerVoidFn signature and recordMetric surface, sharing code-level changes with this PR's metrics type expansion.

Poem

🐰 Hops through shadows, bounds to canary light,
Six fields of cron keep schedules tight,
Swarms now cancel, workflows find their way,
Evolving functions win the day! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'BeeOS integration follow-ups and rollout lifecycle hardening' accurately reflects the main changes: BeeOS integration work and rollout lifecycle improvements across multiple modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/eval.ts (1)
596-610: ⚠️ Potential issue | 🟠 Major

Validate and constrain baselineFunctionId before persisting it.

metadata.baselineFunctionId comes straight from the request, and Line 470 later executes it via trigger(...). That gives callers a second unvalidated function-id execution path inside eval::suite. Sanitize it and restrict it to the namespaces you actually want to benchmark before storing it. As per coding guidelines, "Validate IDs with sanitizeId() (alphanumeric + _-:., max 256 chars)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/eval.ts` around lines 596 - 610, The metadata.baselineFunctionId value is
taken directly from the request and later used in trigger(...), so validate and
constrain it before persisting: call sanitizeId() on
metadata.baselineFunctionId, ensure it matches allowed namespace/whitelist rules
(reject or nullify if invalid or outside allowed namespaces), and only store the
sanitized value into the suite.metadata.baselineFunctionId on the EvalSuite
object; update any code paths that assume its presence to handle a
missing/invalid baselineFunctionId safely.

🧹 Nitpick comments (9)

config.yaml (1)

51-51: Use memory exporter only for local/dev; keep a persistent/exporting backend for shared environments.

Switching to memory makes telemetry ephemeral, which weakens debugging and incident analysis outside local runs. Consider environment-based config (e.g., memory in dev, OTLP/collector in staging/prod).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config.yaml` at line 51, The current telemetry exporter is set to the
ephemeral value "memory" (exporter: memory); change the configuration to use
"memory" only for local/dev and a persistent exporter (e.g., "otlp" or
"collector") for shared/staging/prod by introducing environment-aware selection
(env var or profile) and defaulting to a persistent backend for non-local
environments so telemetry is exported and retained for debugging and incident
analysis.

src/agent-core.ts (1)

869-869: Consider safeguarding JSON.stringify against circular references.

If a non-Error value with circular references is thrown, JSON.stringify(err) will throw, masking the original error and preventing line 870 from executing. A safer fallback would avoid this edge case:

♻️ Suggested safer approach

-      log.error("Chat failed", { agentId, sessionId, duration: chatDuration, error: err instanceof Error ? err.message : JSON.stringify(err) });
+      const errorDetail = err instanceof Error ? err.message : (() => { try { return JSON.stringify(err); } catch { return String(err); } })();
+      log.error("Chat failed", { agentId, sessionId, duration: chatDuration, error: errorDetail });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/agent-core.ts` at line 869, The log.error call that logs the chat failure
(the expression using log.error("Chat failed", { agentId, sessionId, duration:
chatDuration, error: err instanceof Error ? err.message : JSON.stringify(err)
})) can throw if err is a non-Error with circular references; change it to
compute a safe error string before calling log.error by either using
util.inspect(err, { depth: null }) or a small safeStringify helper (try
JSON.stringify and fall back to util.inspect or a catch-all string), then pass
that safe string as the error field so logging cannot itself throw.

src/swarm.ts (1)

357-357: Consider validating the reason parameter.

The reason string is stored directly without length limits or content sanitization. While not a security vulnerability since it's only persisted to state/audit, unbounded input could lead to storage issues or noisy audit logs.

Suggested validation

   async (req: any) => {
     if (req.headers) requireAuth(req);
-    const { swarmId, reason = "operator cancel" } = req.body || req;
+    const { swarmId, reason: rawReason = "operator cancel" } = req.body || req;
+    const reason = typeof rawReason === "string" ? rawReason.slice(0, 500) : "operator cancel";
     const safeSwarmId = sanitizeId(swarmId);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/swarm.ts` at line 357, Validate and sanitize the incoming reason after
extracting it (the current destructuring const { swarmId, reason = "operator
cancel" } = req.body || req;): ensure reason is a string, trim surrounding
whitespace, reject or truncate overly long input (e.g., enforce a
MAX_REASON_LENGTH like 256), replace empty/blank values with the default
"operator cancel", and strip or normalize control characters to avoid noisy or
unbounded audit entries before persisting to state/audit.

src/__tests__/swarm.test.ts (1)

367-414: Good test coverage for happy paths; consider adding edge case tests.

The tests correctly verify basic functionality of swarm::status and swarm::cancel. To improve resilience, consider adding tests for:

swarm::cancel on a non-active (dissolved/cancelled) swarm → should throw
swarm::cancel on a non-existent swarm → should throw
swarm::status returning cancelledAt and cancelReason for a cancelled swarm

Example edge case tests

it("rejects cancel on non-active swarm", async () => {
  const swarmId = "swarm-dissolved";
  seedKv("swarms", swarmId, {
    id: swarmId,
    goal: "done",
    agentIds: ["a1"],
    maxDurationMs: 600000,
    consensusThreshold: 0.66,
    createdAt: 1,
    status: "dissolved",
  });

  await expect(
    call("swarm::cancel", authReq({ swarmId })),
  ).rejects.toThrow("is not active");
});

it("returns cancellation metadata in status", async () => {
  const swarmId = "swarm-cancelled";
  seedKv("swarms", swarmId, {
    id: swarmId,
    goal: "test",
    agentIds: ["a1"],
    maxDurationMs: 600000,
    consensusThreshold: 0.66,
    createdAt: 1,
    status: "cancelled",
    cancelledAt: 12345,
    cancelReason: "test reason",
  });

  const result = await call("swarm::status", authReq({ swarmId }));
  expect(result.cancelledAt).toBe(12345);
  expect(result.cancelReason).toBe("test reason");
});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/swarm.test.ts` around lines 367 - 414, Add edge-case tests
around swarm::cancel and swarm::status: create a test that seeds a non-active
swarm (status "dissolved" or "cancelled") and assert call("swarm::cancel",
authReq({ swarmId })) rejects with an "is not active" error; add a test that
calls swarm::cancel for a non-existent swarm and asserts it rejects; and add a
test that seeds a cancelled swarm with cancelledAt and cancelReason and asserts
call("swarm::status", authReq({ swarmId })) returns those fields. Use the
existing helpers seedKv, call, authReq, and getScope/get to validate state
changes (e.g., expect rejection for cancel on non-active/non-existent swarms and
expect returned cancelledAt/cancelReason in swarm::status).

src/__tests__/skillkit-bridge.test.ts (1)

117-120: Missing mockTriggerVoid.mockClear() in beforeEach.

Same inconsistency as other test files—mockTriggerVoid is not cleared between tests.

♻️ Proposed fix

 beforeEach(() => {
   mockTrigger.mockClear();
+  mockTriggerVoid.mockClear();
   mockExecResult = { stdout: "[]", stderr: "", exitCode: 0 };
 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/skillkit-bridge.test.ts` around lines 117 - 120, The beforeEach
block in the test fails to clear mockTriggerVoid, causing test pollution; update
the beforeEach (the same one that calls mockTrigger.mockClear() and sets
mockExecResult) to also call mockTriggerVoid.mockClear() so both mocks are reset
before each test run.

src/__tests__/streaming.test.ts (1)

48-51: Missing mockTriggerVoid.mockClear() in beforeEach.

The mockTriggerVoid mock is added but not cleared between tests. Other test files (e.g., mcp-client.test.ts) include mockTriggerVoid.mockClear() in beforeEach. This inconsistency could lead to test pollution if void-action assertions are added later.
♻️ Proposed fix
 beforeEach(() => {
   mockTrigger.mockClear();
+  mockTriggerVoid.mockClear();
   process.env.AGENTOS_API_KEY = "test-key";
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/streaming.test.ts` around lines 48 - 51, The beforeEach setup
in streaming.test.ts clears mockTrigger but omits clearing mockTriggerVoid,
which can cause test pollution; update the beforeEach block to also call
mockTriggerVoid.mockClear() so both mocks (mockTrigger and mockTriggerVoid) are
reset before each test, ensuring consistent state across tests and matching
other tests like mcp-client.test.ts.

src/__tests__/workflow.test.ts (1)

530-582: Consider using try/finally for mock restoration.

If this test fails before reaching line 581, mockTrigger won't be restored, potentially causing subsequent tests to fail with misleading errors. Wrapping in try/finally ensures cleanup regardless of test outcome.

♻️ Proposed fix

   it("cancels a running workflow before the next step executes", async () => {
     let releaseStep!: () => void;
     const pausedStep = new Promise<string>((resolve) => {
       releaseStep = () => resolve("step-1 complete");
     });

     const originalImpl = mockTrigger.getMockImplementation()!;
     mockTrigger.mockImplementation(async (fnId: string, data?: any) => {
       if (fnId === "pause::fn") return pausedStep;
       return originalImpl(fnId, data);
     });

-    seedKv("workflows", "cancel-wf", {
-      id: "cancel-wf",
-      ...
-    });
-
-    const running = call("workflow::run", authReq({
-      workflowId: "cancel-wf",
-      input: "go",
-    }));
-
-    await vi.waitFor(() => {
-      expect([...getScope("workflow_runs").keys()][0]).toBeDefined();
-    });
-    const runId = [...getScope("workflow_runs").keys()][0] as string;
-
-    await call("workflow::cancel", authReq({ runId }));
-    releaseStep();
-
-    await expect(running).rejects.toThrow("Workflow run cancelled");
-    expect(getScope("workflow_runs").get(runId)).toEqual(
-      expect.objectContaining({ status: "cancelled" }),
-    );
-
-    mockTrigger.mockImplementation(originalImpl);
+    try {
+      seedKv("workflows", "cancel-wf", {
+        id: "cancel-wf",
+        name: "Cancelable",
+        steps: [
+          {
+            name: "pause",
+            functionId: "pause::fn",
+            mode: "sequential",
+            timeoutMs: 5000,
+            errorMode: "fail",
+          },
+          {
+            name: "after",
+            functionId: "echo::fn",
+            mode: "sequential",
+            timeoutMs: 5000,
+            errorMode: "fail",
+          },
+        ],
+      });
+
+      const running = call("workflow::run", authReq({
+        workflowId: "cancel-wf",
+        input: "go",
+      }));
+
+      await vi.waitFor(() => {
+        expect([...getScope("workflow_runs").keys()][0]).toBeDefined();
+      });
+      const runId = [...getScope("workflow_runs").keys()][0] as string;
+
+      await call("workflow::cancel", authReq({ runId }));
+      releaseStep();
+
+      await expect(running).rejects.toThrow("Workflow run cancelled");
+      expect(getScope("workflow_runs").get(runId)).toEqual(
+        expect.objectContaining({ status: "cancelled" }),
+      );
+    } finally {
+      mockTrigger.mockImplementation(originalImpl);
+    }
   });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/workflow.test.ts` around lines 530 - 582, The test replaces
mockTrigger's implementation and restores it at the end but won't restore if an
assertion throws; capture the original implementation (originalImpl =
mockTrigger.getMockImplementation()!), wrap the portion that mocks and runs the
workflow in a try block, and move the restoration
(mockTrigger.mockImplementation(originalImpl)) into a finally block so
mockTrigger is always restored regardless of test outcome; ensure the try
includes the mocked implementation, the call("workflow::run", ...) invocation,
the waitFor/expect checks and the releaseStep call.

src/context-cache.ts (1)

108-111: Use safeCall instead of inline .catch for state::list.

Line 108-Line 111 currently handles trigger failure with .catch(() => []). Please switch to safeCall for consistent error context and repo-standard failure handling.

♻️ Proposed refactor

+import { safeCall } from "./shared/errors.js";
@@
-    const entries = (await trigger({
-      function_id: "state::list",
-      payload: { scope },
-    }).catch(() => [])) as any[];
+    const entries = (await safeCall(
+      () =>
+        trigger({
+          function_id: "state::list",
+          payload: { scope },
+        }),
+      [],
+      { operation: "context_cache_invalidate_list", functionId: "context_cache::invalidate" },
+    )) as any[];

As per coding guidelines: "Use safeCall(fn, fallback, { operation }) for external calls that may fail".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/context-cache.ts` around lines 108 - 111, Replace the inline .catch
fallback on the trigger call for "state::list" with the repo-standard safeCall
helper: call safeCall(() => trigger({ function_id: "state::list", payload: {
scope } }), [], { operation: "state::list" }) and assign its result to entries
(currently declared as const entries = ...). Ensure you keep the same cast to
any[] and preserve the payload object; this centralizes error context and
consistent failure handling for the trigger invocation.

src/__tests__/tools-extended.test.ts (1)

38-46: Reset mockRegisterTrigger in beforeEach for deterministic assertions.

Since this spy is shared across module import and runtime calls, clearing it per test avoids cross-test coupling.

♻️ Proposed tweak

 beforeEach(() => {
   resetKv();
   mockTrigger.mockClear();
   mockTriggerVoid.mockClear();
+  mockRegisterTrigger.mockClear();
 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/tools-extended.test.ts` around lines 38 - 46, Add a beforeEach
hook that clears the shared spy so tests remain deterministic: call
mockRegisterTrigger.mockClear() (or mockReset()) in a beforeEach block in the
test file so the vi.fn() spy assigned to mockRegisterTrigger is reset before
each test; locate the mock by the symbol mockRegisterTrigger (and optionally
reset handlers if needed) next to the existing vi.mock setup to ensure no
cross-test coupling.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tui/src/main.rs`:
- Around line 1283-1287: The agent_label resolution branch currently only looks
up agent["name"] when app.chat_agent is empty, causing fallback to "brain" if an
agent has an id but no name; change the lookup in the agent_label block to
mirror send_chat's logic by attempting agent["id"].as_str().or_else(||
agent["name"].as_str()).unwrap_or("brain") (i.e., check id first, then name,
then default) so titles match the send path; reference the agent_label variable,
app.chat_agent, app.agents, and the send_chat resolution to locate and update
the code.
- Around line 532-536: The hardcoded fallback "brain" in the agent selection
should be made configurable; replace the inline "brain" default in the agent_id
computation (where self.chat_agent, self.agents are used) with a call to a
helper like default_chat_agent() that reads an environment override (e.g.,
AGENTOS_PRIMARY_AGENT) and returns the appropriate static string ("brain" for
legacy or "default" otherwise); update or add the default_chat_agent() function
and use it in the agent_id expression so the fallback matches other surfaces
(e.g., streaming.ts) and can be overridden at deploy time.

In `@docs/reviews/sentrux/2026-04-04-022609-codex-agentos-tsc-cleanup.md`:
- Around line 6-15: The report exposes machine-local absolute paths in the lines
labeled "Worktree" and "Sentrux scan"; update the report to remove personal
machine paths by replacing those absolute paths with repo-relative paths or a
placeholder (e.g., ./worktree or <WORKTREE_PATH>) and ensure any other
occurrences of `/home/<user>/...` are scrubbed; edit the document content that
contains the "Worktree" and "Sentrux scan" entries so the artifact no longer
contains developer usernames or machine-specific locations.

In `@src/__tests__/eval.test.ts`:
- Around line 243-272: The test currently seeds a suite that references
baselineFunctionId "beeos::route-task-baseline" but mockTrigger only returns
outputs for "evolved::" and "test::" IDs, so the BeeOS baseline path is not
exercised; update the test's mocking setup (the mockTrigger used in this test
suite) to include a synthetic response for "beeos::route-task-baseline" (or
treat "beeos::" prefixes) that returns a plausible baseline output with at least
one score/value, then extend the assertions after call("eval::suite") to check
baselineAggregate contains that score/value (not just testCount), referencing
the mockTrigger and the suite seed using suiteId "suite-routing" and
baselineFunctionId "beeos::route-task-baseline".

In `@src/api.ts`:
- Around line 359-368: The api::health handler currently documents only the
healthy response shape (status, version, workers, uptime) but the shutdown path
returns {status, version, inFlight} with a 503; update the response_format for
the health endpoint (the health_response schema) to reflect both shapes—either
by defining a union/oneOf with a separate shutdown_response schema (including
status:string, version:string, inFlight:number) or by making workers and uptime
optional and adding an optional inFlight:number field, and ensure the 503
shutdown case is explicitly documented in the schema so generated clients will
handle it correctly.

In `@src/cron.ts`:
- Around line 19-22: Replace the ad-hoc .catch fallbacks around trigger(...)
with safeCall so failures are observable: wrap each problematic call (e.g., the
call assigning agents) as await safeCall(() => trigger({ function_id:
"state::list", payload: { scope: "agents" } }), [], { operation: "state::list"
}) and similarly for the other listed occurrences (lines 30-33, 65-68, 99-102)
using the appropriate operation name; remove the trailing as any[] casts and let
the inferred types flow from safeCall so variables like agents receive the
fallback array on failure instead of swallowing errors.

In `@src/eval.ts`:
- Around line 63-115: The passRate currently divides passCount by
testCases.length which ignores per-test weights; change the logic in
runSuiteAggregate (and the other aggregate helper) to track a weighted pass sum
(e.g., passWeight) by adding the test's weight when correctness >= 0.5, and
compute passRate as passWeight / totalWeight when totalWeight > 0; if
totalWeight is 0, fall back to the existing unweighted behavior (e.g., passCount
/ testCases.length or 0) to avoid division by zero.

In `@src/evolve.ts`:
- Around line 287-290: The metadata object currently spreads extraMeta after
setting rolloutState, allowing callers to override it; update the code that
builds metadata (the object with metadata: { rolloutState: "draft",
...(extraMeta || {}) }) so rolloutState cannot be overridden—either move
rolloutState to be set last (e.g., metadata: { ...(extraMeta || {}),
rolloutState: "draft" }) or call syncRolloutState(record) to compute/overwrite
rolloutState on the record before persisting; ensure you reference the metadata
construction and the syncRolloutState(record) helper to enforce the invariant.

In `@src/feedback.ts`:
- Around line 106-112: The review path (feedback::review) currently fetches only
the last 5 evaluations via getRecentEvals(functionId, 5) and skips the canary
safety gate, which makes its promotion rules weaker than feedback::promote;
change review to read up to the configured minEvalsToPromote (or a max like 10
if policy allows) instead of hardcoding 5 by using the policy value returned
from getPolicy(), and reuse the same canary safety-check logic used in promote
(the canary gate guarded in the promote flow) before emitting "promote" /
"ready_for_production" so low-safety canaries are rejected consistently; update
the other occurrence (lines ~145-158) the same way.

In `@src/llm-router.ts`:
- Around line 221-228: When serializing tools in llm-router (the tools =
req.tools?.map(...) block and the similar block at 305-310), preserve a
request-local mapping from the provider-facing name back to the original tool id
so downstream dispatch can recover the original id; specifically, after
computing toolId and providerName (the value assigned to name via
toolId.replace(/::/g, "_")), add an entry into a map (e.g.,
providerNameToOriginalId) keyed by providerName with value toolId and attach
this map to the request context or the same response object so the response-path
that decodes provider names (which uses .replace(/_/g, "::")) can look up the
true original id instead of relying on lossy string transforms. Ensure both
serialization sites (the two mapped blocks) maintain the same map and that
subsequent code uses providerNameToOriginalId to resolve original ids.
- Around line 298-301: The request body currently always includes
max_completion_tokens which breaks many backends; modify the body construction
around where body: Record<string, unknown> is created (using req.model.model,
req.model.maxTokens, messages) to conditionally set the token field per
provider: if the provider is Groq (check req.provider or req.model.provider)
include max_completion_tokens: req.model.maxTokens, otherwise include
max_tokens: req.model.maxTokens; ensure you do not send both keys and update any
downstream code expecting a single token field.

In `@src/swarm.ts`:
- Around line 349-390: The handler "swarm::cancel" is written to accept
HTTP-style requests (it checks req.headers and calls requireAuth) but no HTTP
trigger is registered; add a registerTrigger entry that maps an HTTP POST to
this function so API calls reach it (e.g., registerTrigger with type "http",
function_id "swarm::cancel", config specifying api_path "api/swarm/:id/cancel"
and http_method "POST"); ensure the trigger passes the route param as swarmId
(so sanitizeId(swarmId) in swarm::cancel still works) and that auth flows remain
unchanged.
- Around line 200-232: The swarm::status registered function appears to expect
HTTP requests (it checks req.headers and calls requireAuth) but no HTTP trigger
is registered; add a registerTrigger call that maps the "swarm::status" function
to a distinct HTTP path (avoid colliding with existing "swarm::collect"), e.g.
registerTrigger with type "http", function_id "swarm::status", config using a
GET method and an api_path such as "api/swarm/:id/info"; place this trigger
registration near other trigger registrations (after the registerFunction for
swarm::status or with the other registerTrigger calls) so HTTP requests route
correctly to the requireAuth-protected handler.

In `@src/workflow.ts`:
- Around line 313-320: Validate and sanitize the incoming runId before using it:
call sanitizeId(runId) (per the ID rules) and use the sanitized value in the
trigger payload in this handler that calls trigger({ function_id: "state::get",
... }). After triggering, enforce consistent not-found behavior: if the
state::get response is null, either throw a clear NotFound error (matching
workflow::cancel behavior) or return a consistent "not found" indicator
object—choose the same approach used by workflow::cancel and apply it here. Keep
the existing requireAuth(req) check unchanged.
- Around line 329-335: In the async request handler (async (req: any) => { … })
validate and sanitize the incoming runId before using it: call sanitizeId(runId)
(or return an error if invalid) and use the sanitized id when building the
payload for trigger({ function_id: "state::get", payload: { scope:
"workflow_runs", key: ... } }); ensure this check happens after requireAuth(req)
and before invoking trigger so no unvalidated runId is sent to state operations.

---

Outside diff comments:
In `@src/eval.ts`:
- Around line 596-610: The metadata.baselineFunctionId value is taken directly
from the request and later used in trigger(...), so validate and constrain it
before persisting: call sanitizeId() on metadata.baselineFunctionId, ensure it
matches allowed namespace/whitelist rules (reject or nullify if invalid or
outside allowed namespaces), and only store the sanitized value into the
suite.metadata.baselineFunctionId on the EvalSuite object; update any code paths
that assume its presence to handle a missing/invalid baselineFunctionId safely.

---

Nitpick comments:
In `@config.yaml`:
- Line 51: The current telemetry exporter is set to the ephemeral value "memory"
(exporter: memory); change the configuration to use "memory" only for local/dev
and a persistent exporter (e.g., "otlp" or "collector") for shared/staging/prod
by introducing environment-aware selection (env var or profile) and defaulting
to a persistent backend for non-local environments so telemetry is exported and
retained for debugging and incident analysis.

In `@src/__tests__/skillkit-bridge.test.ts`:
- Around line 117-120: The beforeEach block in the test fails to clear
mockTriggerVoid, causing test pollution; update the beforeEach (the same one
that calls mockTrigger.mockClear() and sets mockExecResult) to also call
mockTriggerVoid.mockClear() so both mocks are reset before each test run.

In `@src/__tests__/streaming.test.ts`:
- Around line 48-51: The beforeEach setup in streaming.test.ts clears
mockTrigger but omits clearing mockTriggerVoid, which can cause test pollution;
update the beforeEach block to also call mockTriggerVoid.mockClear() so both
mocks (mockTrigger and mockTriggerVoid) are reset before each test, ensuring
consistent state across tests and matching other tests like mcp-client.test.ts.

In `@src/__tests__/swarm.test.ts`:
- Around line 367-414: Add edge-case tests around swarm::cancel and
swarm::status: create a test that seeds a non-active swarm (status "dissolved"
or "cancelled") and assert call("swarm::cancel", authReq({ swarmId })) rejects
with an "is not active" error; add a test that calls swarm::cancel for a
non-existent swarm and asserts it rejects; and add a test that seeds a cancelled
swarm with cancelledAt and cancelReason and asserts call("swarm::status",
authReq({ swarmId })) returns those fields. Use the existing helpers seedKv,
call, authReq, and getScope/get to validate state changes (e.g., expect
rejection for cancel on non-active/non-existent swarms and expect returned
cancelledAt/cancelReason in swarm::status).

In `@src/__tests__/tools-extended.test.ts`:
- Around line 38-46: Add a beforeEach hook that clears the shared spy so tests
remain deterministic: call mockRegisterTrigger.mockClear() (or mockReset()) in a
beforeEach block in the test file so the vi.fn() spy assigned to
mockRegisterTrigger is reset before each test; locate the mock by the symbol
mockRegisterTrigger (and optionally reset handlers if needed) next to the
existing vi.mock setup to ensure no cross-test coupling.

In `@src/__tests__/workflow.test.ts`:
- Around line 530-582: The test replaces mockTrigger's implementation and
restores it at the end but won't restore if an assertion throws; capture the
original implementation (originalImpl = mockTrigger.getMockImplementation()!),
wrap the portion that mocks and runs the workflow in a try block, and move the
restoration (mockTrigger.mockImplementation(originalImpl)) into a finally block
so mockTrigger is always restored regardless of test outcome; ensure the try
includes the mocked implementation, the call("workflow::run", ...) invocation,
the waitFor/expect checks and the releaseStep call.

In `@src/agent-core.ts`:
- Line 869: The log.error call that logs the chat failure (the expression using
log.error("Chat failed", { agentId, sessionId, duration: chatDuration, error:
err instanceof Error ? err.message : JSON.stringify(err) })) can throw if err is
a non-Error with circular references; change it to compute a safe error string
before calling log.error by either using util.inspect(err, { depth: null }) or a
small safeStringify helper (try JSON.stringify and fall back to util.inspect or
a catch-all string), then pass that safe string as the error field so logging
cannot itself throw.

In `@src/context-cache.ts`:
- Around line 108-111: Replace the inline .catch fallback on the trigger call
for "state::list" with the repo-standard safeCall helper: call safeCall(() =>
trigger({ function_id: "state::list", payload: { scope } }), [], { operation:
"state::list" }) and assign its result to entries (currently declared as const
entries = ...). Ensure you keep the same cast to any[] and preserve the payload
object; this centralizes error context and consistent failure handling for the
trigger invocation.

In `@src/swarm.ts`:
- Line 357: Validate and sanitize the incoming reason after extracting it (the
current destructuring const { swarmId, reason = "operator cancel" } = req.body
|| req;): ensure reason is a string, trim surrounding whitespace, reject or
truncate overly long input (e.g., enforce a MAX_REASON_LENGTH like 256), replace
empty/blank values with the default "operator cancel", and strip or normalize
control characters to avoid noisy or unbounded audit entries before persisting
to state/audit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d8bcad3-4996-40be-b8b5-330b886f4932

📥 Commits

Reviewing files that changed from the base of the PR and between c664876 and 9179d7f.

📒 Files selected for processing (34)

config.yaml
crates/tui/src/main.rs
docs/reviews/sentrux/2026-04-04-022609-codex-agentos-tsc-cleanup.md
src/__tests__/cron.test.ts
src/__tests__/eval.test.ts
src/__tests__/evolve.test.ts
src/__tests__/feedback.test.ts
src/__tests__/hand-runner.test.ts
src/__tests__/mcp-client.test.ts
src/__tests__/recovery.test.ts
src/__tests__/security-headers.test.ts
src/__tests__/session-lifecycle.test.ts
src/__tests__/skillkit-bridge.test.ts
src/__tests__/streaming.test.ts
src/__tests__/swarm.test.ts
src/__tests__/tools-extended.test.ts
src/__tests__/workflow.test.ts
src/agent-core.ts
src/api.ts
src/context-cache.ts
src/cron.ts
src/eval.ts
src/evolve.ts
src/feedback.ts
src/hand-runner.ts
src/llm-router.ts
src/recovery.ts
src/session-lifecycle.ts
src/shared/cron.ts
src/shared/metrics.ts
src/swarm.ts
src/tools-extended.ts
src/types.ts
src/workflow.ts

coderabbitai · 2026-04-03T22:35:10Z

crates/tui/src/main.rs

        let agent_id = if self.chat_agent.is_empty() {
            self.agents.first()
                .and_then(|a| a["id"].as_str().or(a["name"].as_str()))
-                .unwrap_or("default")
+                .unwrap_or("brain")
                .to_string()


⚠️ Potential issue | 🟠 Major

Make fallback agent configurable to avoid cross-surface regressions.

Line 535 hardcodes "brain" when no agent is selected. src/streaming.ts still defaults missing agent context to "default" (Line 24 and Line 29 in the provided snippet), so this creates fallback drift and can break chat in environments that don’t provision a brain agent.

Proposed fix

- let agent_id = if self.chat_agent.is_empty() { + let agent_id = if self.chat_agent.is_empty() { self.agents.first() .and_then(|a| a["id"].as_str().or(a["name"].as_str())) - .unwrap_or("brain") + .unwrap_or(default_chat_agent()) .to_string() } else { self.chat_agent.clone() };

fn default_chat_agent() -> &'static str { // Keeps legacy compatibility by default; BeeOS can override via env at deploy time. if std::env::var("AGENTOS_PRIMARY_AGENT").ok().as_deref() == Some("brain") { "brain" } else { "default" } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/tui/src/main.rs` around lines 532 - 536, The hardcoded fallback "brain" in the agent selection should be made configurable; replace the inline "brain" default in the agent_id computation (where self.chat_agent, self.agents are used) with a call to a helper like default_chat_agent() that reads an environment override (e.g., AGENTOS_PRIMARY_AGENT) and returns the appropriate static string ("brain" for legacy or "default" otherwise); update or add the default_chat_agent() function and use it in the agent_id expression so the fallback matches other surfaces (e.g., streaming.ts) and can be overridden at deploy time.

coderabbitai · 2026-04-03T22:35:10Z

crates/tui/src/main.rs

    let agent_label = if app.chat_agent.is_empty() {
        app.agents.first()
            .and_then(|a| a["name"].as_str())
-            .unwrap_or("default")
+            .unwrap_or("brain")
    } else {


⚠️ Potential issue | 🟡 Minor

Align chat title target resolution with send path.

Line 1284 through Line 1286 resolve only name, while send_chat resolves id or name. If an agent has id but no name, the title can incorrectly show "brain".

Proposed fix

let agent_label = if app.chat_agent.is_empty() { app.agents.first() - .and_then(|a| a["name"].as_str()) + .and_then(|a| a["name"].as_str().or(a["id"].as_str())) .unwrap_or("brain") } else { &app.chat_agent };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/tui/src/main.rs` around lines 1283 - 1287, The agent_label resolution branch currently only looks up agent["name"] when app.chat_agent is empty, causing fallback to "brain" if an agent has an id but no name; change the lookup in the agent_label block to mirror send_chat's logic by attempting agent["id"].as_str().or_else(|| agent["name"].as_str()).unwrap_or("brain") (i.e., check id first, then name, then default) so titles match the send path; reference the agent_label variable, app.chat_agent, app.agents, and the send_chat resolution to locate and update the code.

coderabbitai · 2026-04-03T22:35:10Z

docs/reviews/sentrux/2026-04-04-022609-codex-agentos-tsc-cleanup.md

+- Worktree: `/home/wunitb/unitb_labs/wunb-agentos-beeos/.worktrees/agentos-beeos-main-runtime`
+- Base commit: `4b4de07e2603d71f87bee17a91a8185dcb82e6a7`
+
+## Commands Used
+
+```text
+npx tsc -p tsconfig.json --noEmit --pretty false
+npm test -- src/__tests__/mcp-client.test.ts src/__tests__/security-headers.test.ts src/__tests__/skillkit-bridge.test.ts src/__tests__/streaming.test.ts src/__tests__/llm-router.test.ts src/__tests__/session-lifecycle.test.ts src/__tests__/cron.test.ts src/__tests__/api.test.ts
+git diff --check
+Sentrux scan: /home/wunitb/unitb_labs/wunb-agentos-beeos/.worktrees/agentos-beeos-main-runtime


⚠️ Potential issue | 🟡 Minor

Scrub machine-local paths before committing this report.

The Worktree and Sentrux scan entries expose /home/wunitb/..., which leaks a developer username and makes the artifact non-portable. Prefer repo-relative paths or a placeholder here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/reviews/sentrux/2026-04-04-022609-codex-agentos-tsc-cleanup.md` around lines 6 - 15, The report exposes machine-local absolute paths in the lines labeled "Worktree" and "Sentrux scan"; update the report to remove personal machine paths by replacing those absolute paths with repo-relative paths or a placeholder (e.g., ./worktree or <WORKTREE_PATH>) and ensure any other occurrences of `/home/<user>/...` are scrubbed; edit the document content that contains the "Worktree" and "Sentrux scan" entries so the artifact no longer contains developer usernames or machine-specific locations.

coderabbitai · 2026-04-03T22:35:10Z

src/__tests__/eval.test.ts

+  it("returns suite metadata and baseline comparison context BeeOS can display", async () => {
+    seedKv("eval_suites", "suite-routing", {
+      suiteId: "suite-routing",
+      name: "Routing Suite",
+      functionId: "test::double",
+      metadata: {
+        candidateClass: "routing",
+        baselineFunctionId: "beeos::route-task-baseline",
+      },
+      testCases: [
+        {
+          input: { value: 2 },
+          expected: { result: 4 },
+          scorer: "exact_match",
+        },
+      ],
+    });
+
+    const result = await call(
+      "eval::suite",
+      authReq({ suiteId: "suite-routing" }),
+    );
+
+    expect(result.metadata).toEqual({
+      candidateClass: "routing",
+      baselineFunctionId: "beeos::route-task-baseline",
+      baselineAggregate: expect.objectContaining({
+        testCount: 1,
+      }),
+    });


⚠️ Potential issue | 🟡 Minor

This BeeOS regression test doesn't actually execute the BeeOS baseline path.

mockTrigger only returns synthetic outputs for evolved:: and test:: IDs, so beeos::route-task-baseline falls through to null. That means this assertion can still pass without proving that baseline comparison is wired correctly. Please add an explicit BeeOS baseline mock and assert at least one baseline score/value, not just testCount.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/__tests__/eval.test.ts` around lines 243 - 272, The test currently seeds a suite that references baselineFunctionId "beeos::route-task-baseline" but mockTrigger only returns outputs for "evolved::" and "test::" IDs, so the BeeOS baseline path is not exercised; update the test's mocking setup (the mockTrigger used in this test suite) to include a synthetic response for "beeos::route-task-baseline" (or treat "beeos::" prefixes) that returns a plausible baseline output with at least one score/value, then extend the assertions after call("eval::suite") to check baselineAggregate contains that score/value (not just testCount), referencing the mockTrigger and the suite seed using suiteId "suite-routing" and baselineFunctionId "beeos::route-task-baseline".

coderabbitai · 2026-04-03T22:35:10Z

src/api.ts

+    response_format: {
+      name: "health_response",
+      type: "object",
+      body: [
+        { name: "status", type: "string", description: "Health status" },
+        { name: "version", type: "string", description: "Application version" },
+        { name: "workers", type: "number", description: "Active worker count" },
+        { name: "uptime", type: "number", description: "Process uptime in seconds" },
+      ],
+    },


⚠️ Potential issue | 🟡 Minor

api::health still advertises the wrong body for the shutdown path.

On Lines 371-379 this handler returns { status, version, inFlight } with a 503, but the new schema only describes the healthy { status, version, workers, uptime } shape. Any BeeOS/generated client consuming response_format can mis-handle shutdown responses here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/api.ts` around lines 359 - 368, The api::health handler currently documents only the healthy response shape (status, version, workers, uptime) but the shutdown path returns {status, version, inFlight} with a 503; update the response_format for the health endpoint (the health_response schema) to reflect both shapes—either by defining a union/oneOf with a separate shutdown_response schema (including status:string, version:string, inFlight:number) or by making workers and uptime optional and adding an optional inFlight:number field, and ensure the 503 shutdown case is explicitly documented in the schema so generated clients will handle it correctly.

coderabbitai · 2026-04-03T22:35:10Z

src/llm-router.ts

  const body: Record<string, unknown> = {
    model: req.model.model,
-    max_tokens: req.model.maxTokens,
+    max_completion_tokens: req.model.maxTokens,
    messages,


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

For current /chat/completionsAPIs, which token-limit field is accepted by OpenAI, Groq, DeepSeek, Mistral, Together, Fireworks, OpenRouter, vLLM, LM Studio, and Ollama:max_tokens, max_completion_tokens, or both?

💡 Result:

OpenAI: max_tokens {value: "max_tokens", citations: [3,5,7], confidence: "high"} Groq: both (max_completion_tokens preferred, max_tokens deprecated) {value: "both (max_completion_tokens preferred, max_tokens deprecated)", citations: [6,10], confidence: "high"} DeepSeek: max_tokens {value: "max_tokens", citations: [11,12], confidence: "high"} Mistral: max_tokens {value: "max_tokens", citations: [17,19], confidence: "high"} Together: max_tokens {value: "max_tokens", citations: [23], confidence: "high"} Fireworks: max_tokens {value: "max_tokens", citations: [26,30], confidence: "high"} OpenRouter: max_tokens {value: "max_tokens", citations: [31,34], confidence: "high"} vLLM: max_tokens {value: "max_tokens", citations: [36,37], confidence: "high"} LM Studio: max_tokens {value: "max_tokens", citations: [41,45], confidence: "high"} Ollama: max_tokens (OpenAI compatible, no contradicting evidence) {value: "max_tokens", citations: [], confidence: "medium"}

Citations:

1: http://developers.openai.com/api/docs/api-reference/chat/object

2: https://platform.openai.com/docs/api-reference/chat/message-list

3: https://console.groq.com/docs/api-reference

4: https://community.openai.com/t/chat-completions-api-max-tokens-default-value-is-missing/852146

5: https://console.groq.com/docs/text-chat

6: https://api-docs.deepseek.com/api/create-chat-completion

7: https://chat-deep.ai/docs/api/

8: https://docs.mistral.ai/api/endpoint/chat

9: https://docs.mistral.ai/api

10: https://docs.together.ai/reference/chat-completions

11: https://fireworks.ai/docs/faq-new/models-inference/what-are-the-maximum-completion-token-limits-for-models-and-can-they-be-increase

12: https://docs.fireworks.ai/api-reference/post-chatcompletions

13: https://openrouter.ai/docs/api-reference/chat-completion

14: https://openrouter.ai/docs/api-reference/overview

15: https://docs.vllm.ai/en/v0.9.2/serving/openai_compatible_server.html

16: https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html

17: https://lmstudio.ai/docs/api/openai-api

18: https://lmstudio.ai/docs/developer/openai-compat/chat-completions

Gate max_completion_tokens by provider.

Sending max_completion_tokens universally breaks compatibility with most backends. Web search confirms that OpenAI, DeepSeek, Mistral, Together, Fireworks, OpenRouter, vLLM, LM Studio, and Ollama all expect max_tokens. Only Groq prefers max_completion_tokens (though it accepts max_tokens for backward compatibility).

This field must be provider-specific in the request body to avoid request rejections across the majority of OpenAI-compatible backends.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/llm-router.ts` around lines 298 - 301, The request body currently always includes max_completion_tokens which breaks many backends; modify the body construction around where body: Record<string, unknown> is created (using req.model.model, req.model.maxTokens, messages) to conditionally set the token field per provider: if the provider is Groq (check req.provider or req.model.provider) include max_completion_tokens: req.model.maxTokens, otherwise include max_tokens: req.model.maxTokens; ensure you do not send both keys and update any downstream code expecting a single token field.

coderabbitai · 2026-04-03T22:35:10Z

src/swarm.ts

+registerFunction(
+  {
+    id: "swarm::status",
+    description: "Get current swarm status and message totals",
+    metadata: { category: "swarm" },
+  },
+  async (req: any) => {
+    if (req.headers) requireAuth(req);
+    const { swarmId } = req.body || req;
+    const safeSwarmId = sanitizeId(swarmId);
+
+    const swarm = (await trigger({ function_id: "state::get", payload: {
+      scope: "swarms",
+      key: safeSwarmId,
+    } }).catch(() => null)) as SwarmConfig | null;
+
+    if (!swarm) throw new Error(`Swarm ${safeSwarmId} not found`);
+
+    const messages = (await trigger({ function_id: "state::list", payload: {
+      scope: `swarm_messages:${safeSwarmId}`,
+    } }).catch(() => [])) as any[];
+
+    return {
+      swarmId: safeSwarmId,
+      status: swarm.status,
+      goal: swarm.goal,
+      agentIds: swarm.agentIds,
+      totalMessages: messages.length,
+      cancelledAt: swarm.cancelledAt,
+      cancelReason: swarm.cancelReason,
+    };
+  },
+);


⚠️ Potential issue | 🟠 Major

Missing HTTP trigger for swarm::status.

This function includes if (req.headers) requireAuth(req) which indicates it's designed for HTTP access, but no HTTP trigger is registered at the end of the file. Additionally, the path api/swarm/:id/status is already mapped to swarm::collect, which creates potential confusion.

Consider registering an HTTP trigger with a distinct path:

Proposed fix

Add after line 416:

registerTrigger({ type: "http", function_id: "swarm::status", config: { api_path: "api/swarm/:id/info", http_method: "GET" }, });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/swarm.ts` around lines 200 - 232, The swarm::status registered function appears to expect HTTP requests (it checks req.headers and calls requireAuth) but no HTTP trigger is registered; add a registerTrigger call that maps the "swarm::status" function to a distinct HTTP path (avoid colliding with existing "swarm::collect"), e.g. registerTrigger with type "http", function_id "swarm::status", config using a GET method and an api_path such as "api/swarm/:id/info"; place this trigger registration near other trigger registrations (after the registerFunction for swarm::status or with the other registerTrigger calls) so HTTP requests route correctly to the requireAuth-protected handler.

coderabbitai · 2026-04-03T22:35:11Z

src/swarm.ts

+registerFunction(
+  {
+    id: "swarm::cancel",
+    description: "Cancel an active swarm",
+    metadata: { category: "swarm" },
+  },
+  async (req: any) => {
+    if (req.headers) requireAuth(req);
+    const { swarmId, reason = "operator cancel" } = req.body || req;
+    const safeSwarmId = sanitizeId(swarmId);
+
+    const swarm = (await trigger({ function_id: "state::get", payload: {
+      scope: "swarms",
+      key: safeSwarmId,
+    } }).catch(() => null)) as SwarmConfig | null;
+
+    if (!swarm) throw new Error(`Swarm ${safeSwarmId} not found`);
+    if (swarm.status !== "active") throw new Error(`Swarm ${safeSwarmId} is not active`);
+
+    await trigger({ function_id: "state::update", payload: {
+      scope: "swarms",
+      key: safeSwarmId,
+      operations: [
+        { type: "set", path: "status", value: "cancelled" },
+        { type: "set", path: "cancelledAt", value: Date.now() },
+        { type: "set", path: "cancelReason", value: reason },
+      ],
+    } });
+
+    triggerVoid("publish", {
+      topic: `swarm:${safeSwarmId}`,
+      data: { type: "swarm_cancelled", swarmId: safeSwarmId, reason },
+    });
+
+    triggerVoid("security::audit", {
+      type: "swarm_cancelled",
+      detail: { swarmId: safeSwarmId, reason },
+    });
+
+    return { cancelled: true, swarmId: safeSwarmId };
+  },
+);


⚠️ Potential issue | 🟠 Major

Missing HTTP trigger for swarm::cancel.

This function includes auth handling via if (req.headers) requireAuth(req) indicating HTTP access was intended, but no corresponding HTTP trigger is registered.

Proposed fix

Add after line 416:

registerTrigger({ type: "http", function_id: "swarm::cancel", config: { api_path: "api/swarm/:id/cancel", http_method: "POST" }, });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/swarm.ts` around lines 349 - 390, The handler "swarm::cancel" is written to accept HTTP-style requests (it checks req.headers and calls requireAuth) but no HTTP trigger is registered; add a registerTrigger entry that maps an HTTP POST to this function so API calls reach it (e.g., registerTrigger with type "http", function_id "swarm::cancel", config specifying api_path "api/swarm/:id/cancel" and http_method "POST"); ensure the trigger passes the route param as swarmId (so sanitizeId(swarmId) in swarm::cancel still works) and that auth flows remain unchanged.

coderabbitai · 2026-04-03T22:35:11Z

src/workflow.ts

+  async (req: any) => {
+    if (req.headers) requireAuth(req);
+    const { runId } = req.body || req;
+    return trigger({ function_id: "state::get", payload: {
+      scope: "workflow_runs",
+      key: runId,
+    } });
+  },


⚠️ Potential issue | 🟡 Minor

Missing runId validation and inconsistent null handling.

Per coding guidelines, IDs should be validated with sanitizeId() before use.

workflow::get_run returns null for non-existent runs, while workflow::cancel throws an error. Consider consistent behavior—either throw or return a clear "not found" indicator.

🛡️ Proposed fix

+import { sanitizeId } from "./shared/utils.js"; + registerFunction( { id: "workflow::get_run", description: "Get one workflow run by id", metadata: { category: "workflow" }, }, async (req: any) => { if (req.headers) requireAuth(req); const { runId } = req.body || req; + sanitizeId(runId); - return trigger({ function_id: "state::get", payload: { + const run = await trigger({ function_id: "state::get", payload: { scope: "workflow_runs", key: runId, } }); + if (!run) throw new Error(`Workflow run ${runId} not found`); + return run; }, );

As per coding guidelines: "Validate IDs with sanitizeId() (alphanumeric + _-:., max 256 chars)"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async (req: any) => {

if (req.headers) requireAuth(req);

const { runId } = req.body || req;

return trigger({ function_id: "state::get", payload: {

scope: "workflow_runs",

key: runId,

} });

},

async (req: any) => {

if (req.headers) requireAuth(req);

const { runId } = req.body || req;

sanitizeId(runId);

const run = await trigger({ function_id: "state::get", payload: {

scope: "workflow_runs",

key: runId,

} });

if (!run) throw new Error(`Workflow run ${runId} not found`);

return run;

},

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/workflow.ts` around lines 313 - 320, Validate and sanitize the incoming runId before using it: call sanitizeId(runId) (per the ID rules) and use the sanitized value in the trigger payload in this handler that calls trigger({ function_id: "state::get", ... }). After triggering, enforce consistent not-found behavior: if the state::get response is null, either throw a clear NotFound error (matching workflow::cancel behavior) or return a consistent "not found" indicator object—choose the same approach used by workflow::cancel and apply it here. Keep the existing requireAuth(req) check unchanged.

coderabbitai · 2026-04-03T22:35:11Z

src/workflow.ts

+  async (req: any) => {
+    if (req.headers) requireAuth(req);
+    const { runId } = req.body || req;
+    const run = await trigger({ function_id: "state::get", payload: {
+      scope: "workflow_runs",
+      key: runId,
+    } }) as any;


⚠️ Potential issue | 🟡 Minor

Missing runId validation with sanitizeId().

Same guideline applies here—validate user-provided IDs before passing to state operations.

🛡️ Proposed fix

async (req: any) => { if (req.headers) requireAuth(req); const { runId } = req.body || req; + sanitizeId(runId); const run = await trigger({ function_id: "state::get", payload: {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async (req: any) => {

if (req.headers) requireAuth(req);

const { runId } = req.body || req;

const run = await trigger({ function_id: "state::get", payload: {

scope: "workflow_runs",

key: runId,

} }) as any;

async (req: any) => {

if (req.headers) requireAuth(req);

const runId = sanitizeId((req.body || req).runId);

const run = await trigger({ function_id: "state::get", payload: {

scope: "workflow_runs",

key: runId,

} }) as any;

Suggested change

async (req: any) => {

if (req.headers) requireAuth(req);

const { runId } = req.body || req;

const run = await trigger({ function_id: "state::get", payload: {

scope: "workflow_runs",

key: runId,

} }) as any;

async (req: any) => {

if (req.headers) requireAuth(req);

let { runId } = req.body || req;

runId = sanitizeId(runId);

const run = await trigger({ function_id: "state::get", payload: {

scope: "workflow_runs",

key: runId,

} }) as any;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/workflow.ts` around lines 329 - 335, In the async request handler (async (req: any) => { … }) validate and sanitize the incoming runId before using it: call sanitizeId(runId) (or return an error if invalid) and use the sanitized id when building the payload for trigger({ function_id: "state::get", payload: { scope: "workflow_runs", key: ... } }); ensure this check happens after requireAuth(req) and before invoking trigger so no unvalidated runId is sent to state operations.

wunitb and others added 7 commits April 1, 2026 16:49

feat(agentos): add workflow and swarm runtime controls

99c1f23

feat(agentos): add rollout states for self-improvement candidates

f66263f

fix(agentos): restore rollout lifecycle regressions

2c53632

fix(agentos): normalize cron triggers for iii

4b4de07

fix(agentos): clean runtime TypeScript drift

9179d7f

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Conversation

wunitb commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Release Notes

Uh oh!

vercel bot commented Apr 3, 2026

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wunitb commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading