feat: add iii-eval worker by rohitg00 · Pull Request #10 · iii-hq/workers

rohitg00 · 2026-04-07T18:27:35Z

Summary

OTel-native evaluation worker with 7 functions
Ingest spans, compute P50/P95/P99 metrics, score system health, detect drift, analyze traces
All through iii primitives: State, Cron, PubSub Subscribe, HTTP triggers
19 tests, 0 warnings

Functions

Function	Description
`eval::ingest`	Receive span data via PubSub `telemetry.spans`
`eval::metrics`	P50/P95/P99 latency, success rate, throughput
`eval::score`	System-wide health score (0-100)
`eval::drift`	Detect metric drift against baselines
`eval::baseline`	Snapshot current metrics
`eval::report`	Full report: metrics + score + drift
`eval::analyze_traces`	Read OTel traces, find patterns, generate insights

Test plan

cargo test — 19 tests passing
cargo check — 0 warnings
E2E tested with engine v0.10.0 + console chat bar

Summary by CodeRabbit

Release Notes

New Features
- Introduced evaluation worker that monitors function execution performance: processes telemetry data, calculates success and latency metrics, computes system health scores, and detects performance regressions against saved baselines.
- Added configurable performance drift detection thresholds with automated periodic checks.
- Exposed REST API and CLI interface for generating comprehensive evaluation reports.
Documentation
- Added specification and detailed setup documentation.

coderabbitai · 2026-04-07T18:27:49Z

Warning

Rate limit exceeded

@rohitg00 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e2ccfbee-807e-4bd9-8591-887c66fc49bf

📥 Commits

Reviewing files that changed from the base of the PR and between 4068e9c and 11a5a84.

📒 Files selected for processing (17)

eval/Cargo.toml
eval/README.md
eval/SPEC.md
eval/build.rs
eval/config.yaml
eval/src/config.rs
eval/src/functions/analyze.rs
eval/src/functions/baseline.rs
eval/src/functions/drift.rs
eval/src/functions/ingest.rs
eval/src/functions/metrics.rs
eval/src/functions/mod.rs
eval/src/functions/report.rs
eval/src/functions/score.rs
eval/src/functions/state.rs
eval/src/main.rs
eval/src/manifest.rs

📝 Walkthrough

Walkthrough

A new evaluation worker (iii-eval) crate is added under eval/ with a complete Rust binary for ingesting OpenTelemetry span data, computing metrics, detecting drift, and generating health scores. The system registers seven evaluation functions with the III engine and uses persistent state storage and pub/sub triggers for operation.

Changes

Cohort / File(s)	Summary
Configuration & Build `eval/Cargo.toml`, `eval/build.rs`, `eval/config.yaml`, `eval/src/config.rs`, `eval/src/manifest.rs`	Cargo manifest with dependencies (tokio, serde, tracing, clap, chrono), build script for target export, YAML config schema with retention/drift/cron settings, configuration loader with defaults and YAML deserialization, and manifest builder using Cargo macros.
Documentation `eval/README.md`, `eval/SPEC.md`	README describing the evaluation worker responsibilities and CLI usage; specification detailing six API functions, state schema, two trigger types (cron and PubSub), and example curl workflow.
Core Handlers `eval/src/functions/ingest.rs`, `eval/src/functions/metrics.rs`, `eval/src/functions/baseline.rs`, `eval/src/functions/drift.rs`, `eval/src/functions/score.rs`, `eval/src/functions/report.rs`, `eval/src/functions/analyze.rs`	Seven async handlers: ingest appends span events to state with cap enforcement; metrics computes p50/p95/p99 latencies, success rate, and throughput; baseline snapshots current metrics; drift compares metrics against baselines with threshold detection; score calculates 0–100 health with penalties for failures and high latency; report aggregates metrics/drift/scores across functions; analyze extracts and ranks traces by latency/activity/errors.
Supporting Modules `eval/src/functions/state.rs`, `eval/src/functions/mod.rs`, `eval/src/main.rs`	State helper functions for persistent get/set operations; function registration registry binding all handlers to the III engine with request/response schemas and trigger setup; main binary entry point with CLI parsing, config loading, worker registration, and trigger configuration (cron for drift, PubSub for ingest, HTTP for trace analysis).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant III as III Engine
    participant PubSub
    participant State
    participant Handlers

    Note over Client,Handlers: Span Ingestion Flow
    PubSub->>III: Publish telemetry.spans
    III->>Handlers: Trigger eval::ingest
    Handlers->>State: Load existing spans
    Handlers->>State: Append new span & cap
    State-->>Handlers: Confirm stored
    Handlers-->>III: Return result

    Note over Client,Handlers: Metrics & Scoring Flow
    Client->>III: POST /eval/metrics
    III->>Handlers: Trigger eval::metrics
    Handlers->>State: Load function spans
    Handlers->>Handlers: Compute p50/p95/p99, rates
    Handlers-->>III: Return metrics JSON
    
    Client->>III: POST /eval/score
    III->>Handlers: Trigger eval::score
    Handlers->>State: Load all function spans
    Handlers->>Handlers: Aggregate & penalize scores
    Handlers-->>III: Return health score

sequenceDiagram
    participant Client
    participant III as III Engine
    participant State
    participant Handlers
    participant Cron

    Note over Client,Handlers: Baseline & Drift Detection
    Client->>III: POST /eval/baseline?function_id=f1
    III->>Handlers: Trigger eval::baseline
    Handlers->>State: Load spans for f1
    Handlers->>Handlers: Compute current metrics
    Handlers->>State: Store baseline snapshot
    State-->>Handlers: Confirm
    Handlers-->>III: Return saved baseline

    Cron->>III: Trigger (cron: every 10min)
    III->>Handlers: Trigger eval::drift
    Handlers->>State: Load baseline for each function
    Handlers->>State: Load current spans
    Handlers->>Handlers: Compare metrics dimensions
    Handlers->>Handlers: Calculate delta percentages
    Handlers-->>III: Return drift results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

sergiofilhowz

Poem

🐰 Whiskers twitch with glee,
Seven functions hop and spree,
Spans ingested, drifts detected,
Health and scores meticulously perfected!
Metrics dance through state so sweet,
This eval worker can't be beat! 🎯✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add iii-eval worker' directly and clearly summarizes the main change—the introduction of a new evaluation worker component with multiple functions for OpenTelemetry-native span processing.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/eval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

OTel-native evaluation worker for iii engine. 7 functions: - eval::ingest — receive span data via PubSub - eval::metrics — P50/P95/P99 latency, success rate, throughput - eval::score — system-wide health score (0-100) - eval::drift — detect metric drift against baselines - eval::baseline — snapshot current metrics - eval::report — full report with metrics + score + drift - eval::analyze_traces — read OTel traces, find patterns, generate insights Uses iii primitives: State (spans, baselines), Cron (drift check), Subscribe (telemetry.spans), HTTP triggers. 19 tests, 0 warnings.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (12)

eval/src/manifest.rs (1)

17-23: Avoid duplicating defaults between manifest and runtime config.

default_config is hardcoded here; it can drift from EvalConfig::default() over time.

Proposed fix

+use crate::config::EvalConfig;
+
 pub fn build_manifest() -> ModuleManifest {
     ModuleManifest {
         name: env!("CARGO_PKG_NAME").to_string(),
         version: env!("CARGO_PKG_VERSION").to_string(),
         description: "III engine OTel-native evaluation worker".to_string(),
-        default_config: serde_json::json!({
-            "retention_hours": 24,
-            "drift_threshold": 0.15,
-            "cron_drift_check": "0 */10 * * * *",
-            "max_spans_per_function": 1000,
-            "baseline_window_minutes": 60
-        }),
+        default_config: serde_json::to_value(EvalConfig::default())
+            .expect("EvalConfig should always serialize"),
         supported_targets: vec![env!("TARGET").to_string()],
     }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/manifest.rs` around lines 17 - 23, The manifest hardcodes
default_config JSON that duplicates EvalConfig::default(), causing drift;
replace the hardcoded serde_json::json! block with a conversion from
EvalConfig::default() (e.g., serialize or convert EvalConfig::default() into a
serde_json::Value) so default_config is derived from EvalConfig::default()
instead of duplicated; update the code that constructs default_config (the
variable named default_config in this diff) to call EvalConfig::default() and
serialize/convert it into the same JSON/Value shape used elsewhere.

eval/src/config.rs (1)

51-54: Validate semantic ranges after deserialization.

load_config should enforce constraints (for example: drift_threshold in [0.0, 1.0], non-zero max_spans_per_function) before returning config.

Proposed fix

 pub fn load_config(path: &str) -> Result<EvalConfig> {
     let contents = std::fs::read_to_string(path)?;
     let config: EvalConfig = serde_yaml::from_str(&contents)?;
+    if !(0.0..=1.0).contains(&config.drift_threshold) {
+        anyhow::bail!("drift_threshold must be between 0.0 and 1.0");
+    }
+    if config.max_spans_per_function == 0 {
+        anyhow::bail!("max_spans_per_function must be > 0");
+    }
     Ok(config)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/config.rs` around lines 51 - 54, After deserializing into EvalConfig
in load_config, validate semantic constraints before returning: check that
EvalConfig.drift_threshold is within 0.0..=1.0 and that
EvalConfig.max_spans_per_function is > 0 (and add any other domain rules
required by EvalConfig); if any check fails return an Err with a clear message
(e.g., using anyhow::bail or Result::Err with a descriptive string) instead of
Ok(config). Ensure the validation lives in load_config immediately after
serde_yaml::from_str so callers never receive an out-of-range EvalConfig.

eval/src/functions/report.rs (2)

35-42: Silent error swallowing may hide drift detection failures.

Using .ok() to ignore drift handler errors means failures (e.g., state read errors, serialization issues) are silently discarded. Consider logging drift failures or including a drift_error field in the report for observability.

Suggested improvement

         let drift_result = if has_baseline {
             let drift_payload = json!({ "function_id": fid });
-            crate::functions::drift::handle(iii, config, drift_payload)
-                .await
-                .ok()
+            match crate::functions::drift::handle(iii, config, drift_payload).await {
+                Ok(result) => Some(result),
+                Err(e) => {
+                    tracing::warn!(function_id = %fid, error = %e, "drift check failed");
+                    None
+                }
+            }
         } else {
             None
         };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/report.rs` around lines 35 - 42, Replace the silent .ok()
drop of errors when calling crate::functions::drift::handle so drift failures
are observable: change the handling around the drift_result (the block using
has_baseline, drift_payload, and the call to
crate::functions::drift::handle(iii, config, drift_payload)) to capture the
Result, log the error via the existing logger or attach an error string to the
report (e.g., a drift_error field) instead of returning None, and ensure success
still yields the drift payload; this preserves the existing flow but surfaces
errors for observability.

18-50: Per-function drift calls are inefficient when checking all functions.

The loop calls drift::handle individually for each function with a baseline. However, drift::handle already supports processing all functions when function_id is omitted from the payload. Consider a single drift call for the full report, then matching results by function_id.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/report.rs` around lines 18 - 50, The per-function loop
currently calls crate::functions::drift::handle(...) for each fid; instead, call
crate::functions::drift::handle once outside the loop with a payload that omits
"function_id" to get drift results for all functions, then inside the loop
lookup the drift entry for each fid (e.g., map by function_id) when building
function_reports; adjust the existing variables (function_ids iteration,
existing/state_get logic, compute_metrics call) to use the single drift result
map rather than invoking drift::handle per fid and fall back to None when a fid
has no entry.

eval/src/functions/ingest.rs (1)

21-26: Error message could be more descriptive.

The error message "missing {field}" doesn't indicate whether the field was absent or had wrong type. Consider distinguishing these cases.

Suggested improvement

 fn require_u64(payload: &Value, field: &str) -> Result<u64, IIIError> {
-    payload
-        .get(field)
-        .and_then(|v| v.as_u64())
-        .ok_or_else(|| IIIError::Handler(format!("missing {field}")))
+    match payload.get(field) {
+        Some(v) => v.as_u64().ok_or_else(|| IIIError::Handler(format!("{field} must be a non-negative integer"))),
+        None => Err(IIIError::Handler(format!("missing required field: {field}"))),
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/ingest.rs` around lines 21 - 26, The require_u64 helper
(fn require_u64) currently returns a generic "missing {field}" error; change it
to distinguish between a missing key and a present-but-wrong-type value by first
checking payload.get(field): if None return IIIError::Handler(format!("missing
field '{}'", field)), else if value.as_u64() is None return
IIIError::Handler(format!("field '{}' has wrong type: expected u64, found {}",
field, value.type_name() or value.to_string())), so callers can see whether the
key was absent or contained a non-u64 value; reference require_u64 and use
payload.get(field) and value.as_u64() to implement this branching.

eval/src/functions/metrics.rs (1)

48-53: Potential mismatch between total and durations.len().

total is set to spans.len() but avg_duration_ms divides by durations.len(). If some spans are missing duration_ms, these counts differ. This is intentional (average only over valid durations), but total_invocations in the output may not match the denominator used for average. Consider documenting this or using consistent counts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/metrics.rs` around lines 48 - 53, The code currently sets
total = spans.len() while avg_duration_ms divides by durations.len(), causing
total_invocations (total) to differ from the average's denominator if some spans
lack duration_ms; update the calculation to use a consistent count: compute a
new valid_count = durations.len() (or filter spans to count only those with
duration_ms), use valid_count for the average denominator and for
total_invocations (or explicitly document that total_invocations reflects all
spans while average is over valid durations). Locate the variables total, spans,
durations.len(), avg_duration_ms and total_invocations in metrics.rs and either
set total = durations.len() (or total = valid_count) or add a separate
total_valid_durations and use that for avg_duration_ms and output consistently.

eval/src/functions/analyze.rs (2)

256-268: Division guard relies on filter correctness.

Line 260 divides by s.invocations after filtering for s.error_count > 0. This is safe because error_count can only be incremented when invocations is also incremented (lines 159, 167), but the implicit coupling could be fragile. A defensive guard would be safer.

Optional defensive guard

     let errors: Vec<Value> = stats_map
         .iter()
-        .filter(|(_, s)| s.error_count > 0)
+        .filter(|(_, s)| s.error_count > 0 && s.invocations > 0)
         .map(|(fid, s)| {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/analyze.rs` around lines 256 - 268, The current mapping
that builds errors from stats_map computes error_rate by dividing s.error_count
by s.invocations (in the map that produces the `errors: Vec<Value>`), but this
relies on the invariant that invocations > 0; add a defensive guard inside that
closure used in the map (referencing `stats_map`, the closure parameters `(fid,
s)`, and `s.invocations`) so that if `s.invocations == 0` you set error_rate to
0.0 (or skip the entry) instead of performing the division, and then continue to
construct the JSON object with that guarded rate.

108-118: External trigger call should handle timeout gracefully.

The 10-second timeout is reasonable, but the error message "failed to fetch traces from engine" doesn't distinguish between timeout vs other failures. Consider enriching the error context.

Suggested improvement

         .await
         .map_err(|e| {
-            IIIError::Handler(format!("failed to fetch traces from engine: {e}"))
+            IIIError::Handler(format!("failed to fetch traces from engine::traces::list: {e}"))
         })?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/analyze.rs` around lines 108 - 118, The error mapping for
the external call iii.trigger(TriggerRequest { timeout_ms: Some(10_000), ... })
should distinguish timeouts from other failures; update the map_err closure that
produces IIIError::Handler for traces_response to inspect the returned error
(e.g., check for an is_timeout() method or look for "timeout" in e.to_string())
and return a clearer message like "timed out fetching traces from engine" for
timeout cases and keep the original "failed to fetch traces from engine: {e}"
for other errors so callers can tell timeout vs other failures.

eval/src/functions/mod.rs (1)

74-93: Index update logic could be moved into ingest::handle.

The function index update (checking if fid exists, appending if not) is performed in the registration wrapper rather than in ingest::handle. This splits related logic across two locations. Consider moving this into ingest::handle for better cohesion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/mod.rs` around lines 74 - 93, The registration wrapper in
mod.rs currently performs function index management (checking fid, fetching via
state::state_get, deserializing, pushing fid, and persisting with
state::state_set using ingest::SCOPE_INDEX and ingest::INDEX_KEY); move this
entire index-update block into ingest::handle so related ingest/register logic
lives together—remove the index update from the registration wrapper and call or
invoke the updated ingest::handle path that performs: retrieve index with
state::state_get, safely deserialize to Vec<String>, check contains(&fid), push
fid, and persist with state::state_set, preserving the tracing::warn on
deserialize or state_set errors. Ensure you reference the same symbols (fid,
state::state_get, state::state_set, ingest::SCOPE_INDEX, ingest::INDEX_KEY,
ingest::handle) and keep behavior (no-op if fid empty) unchanged.

eval/src/functions/score.rs (1)

7-7: Consider accepting EvalConfig for configurable thresholds.

The handle function uses hardcoded thresholds (0.95, 5000ms, 0.80, 10000ms) while drift::handle takes &EvalConfig for its configurable drift_threshold. For consistency and operational flexibility, the scoring thresholds should also be configurable.

Suggested signature change

-pub async fn handle(iii: &Arc<III>, _payload: Value) -> Result<Value, IIIError> {
+pub async fn handle(iii: &Arc<III>, config: &EvalConfig, _payload: Value) -> Result<Value, IIIError> {
+    let success_threshold = config.score_success_threshold.unwrap_or(0.95);
+    let p99_threshold_ms = config.score_p99_threshold_ms.unwrap_or(5000);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/score.rs` at line 7, The handle function in
eval/src/functions/score.rs currently hardcodes thresholds (0.95, 5000ms, 0.80,
10000ms); change its signature to accept an &EvalConfig (or a reference to it)
and replace those literal values with the corresponding fields from EvalConfig
(e.g., drift_threshold and timeout/threshold fields), and pass that config into
drift::handle (which already expects &EvalConfig); update any callers of
score::handle to supply the EvalConfig reference so scoring behavior becomes
configurable.

eval/SPEC.md (1)

13-19: Add language identifiers to fenced code blocks.

Static analysis flagged multiple code blocks without language specifiers. Add appropriate identifiers for better rendering and syntax highlighting.

Examples

-```
+```text
 Your Workers → OTel spans → PubSub topic "telemetry.spans" → eval::ingest

-```
+```json
 Input:  {
   function_id: string,

Also applies to: 25-29, 35-48, 54-61, 67-76, 86-96, 102-106, 111-119, 125-133

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@eval/SPEC.md` around lines 13 - 19, Several fenced code blocks in SPEC.md
(e.g., the diagram block starting "Your Workers → OTel spans → PubSub topic
\"telemetry.spans\" → eval::ingest" and the JSON-like block starting "Input:  {
function_id: string,") are missing language identifiers; update each
triple-backtick fence to include an appropriate language label (e.g., ```text
for plain diagrams, ```json for JSON examples, ```yaml for YAML snippets) for
all affected blocks (also apply same change to the ranges noted: 25-29, 35-48,
54-61, 67-76, 86-96, 102-106, 111-119, 125-133) so that syntax highlighting and
rendering are correct.

eval/src/functions/drift.rs (1)

53-67: Relative threshold may be inappropriate for success_rate.

A 15% relative change in success_rate means:

100% → 85% triggers drift (reasonable)

90% → 76.5% triggers drift (reasonable)

50% → 42.5% triggers drift (but 50% success rate is already critical)

Consider using absolute thresholds for success_rate (e.g., any drop > 5 percentage points) while keeping relative thresholds for latency metrics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eval/src/functions/drift.rs` around lines 53 - 67, In the loop over
dimensions in drift.rs (the dimensions array and code computing baseline_v,
current_v and delta_pct), special-case the "success_rate" dimension: instead of
computing delta_pct as a relative change using baseline_v (the existing
delta_pct logic), compute an absolute change as abs(current_v - baseline_v) and
compare that to an absolute threshold (e.g., 0.05 for 5 percentage points); keep
the current relative delta_pct calculation for the latency metrics ("p50_ms",
"p95_ms", "p99_ms", "avg_duration_ms"). Ensure you reference the existing
symbols baseline_val, baseline_v, current, current_v, and delta_pct when adding
this branch.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@eval/README.md`:
- Around line 40-46: The fenced code block that begins with "Options:" is
missing a language tag; update the opening fence from ``` to ```text so the
block is marked (e.g., change the code fence before the "Options:" block to
```text) to satisfy MD040; leave the block contents and closing fence unchanged.

In `@eval/SPEC.md`:
- Line 31: The spec lists 6 functions but the implementation registers an extra
function eval::analyze_traces; add documentation for analyze_traces to
eval/SPEC.md (immediately after the eval::report section) using the suggested
block: specify Input: {limit?: integer, function_filter?: string}, Output: {
summary: {total_spans, unique_functions, time_range, error_rate},
slowest_functions: [...], most_active: [...], errors: [...], insights: [string]
} and a short description: "Reads OTel traces from the engine, aggregates
per-function stats, and generates insights about error rates and slow
functions." Ensure the new entry matches the formatting of the other function
docs so the total function count is consistent.
- Around line 25-29: Update the spec entry to match the implementation: replace
the line that documents eval:function_index with the implemented scope and key
by using the SCOPE_INDEX and INDEX_KEY names — i.e., document the state key as
eval:index:function_list and the description as "list of all tracked function
IDs" so the spec aligns with the constants SCOPE_INDEX and INDEX_KEY used in
ingest.rs.

In `@eval/src/config.rs`:
- Around line 4-17: The EvalConfig struct allows unknown/typoed config keys to
be ignored; add serde's deny_unknown_fields attribute to EvalConfig so unknown
YAML keys cause deserialization to fail fast. Modify the struct annotation for
EvalConfig to include #[serde(deny_unknown_fields)] (in addition to the existing
derives) so deserializing into EvalConfig will reject unexpected fields and
surface configuration mistakes.

In `@eval/src/functions/baseline.rs`:
- Around line 13-18: The code currently collapses all state_get failures into
json!(null) which hides backend/read errors; change the state read to explicitly
handle the Result from state_get (do not use unwrap_or(json!(null))). Match on
state_get(...).await: on Err(e) propagate or return the error (or log and return
a failure response) so callers know the read failed; on Ok(existing) then check
if existing.is_array() and parse the array into spans (using
serde_json::from_value on existing) and treat Null or non-array as empty Vec.
Update handling around the variables state_get, SCOPE_SPANS, function_id,
existing, and spans so read errors are not converted into "no span data".

In `@eval/src/functions/score.rs`:
- Around line 56-74: Scoring uses hardcoded absolute thresholds (success_rate <
0.95, p99 > 5000) which can conflict with the relative drift logic (15%
default); change the scoring in the function that adjusts fn_score (the block
referencing fn_score, success_rate, p99, and issues.push) to use configurable
thresholds (e.g., success_rate_threshold and p99_threshold) or compute
thresholds from the same relative baseline used by drift detection, then read
those values from the same config/params used by drift.rs (or surface them in
the shared config struct), and update issue payloads/tests/docs to reflect the
configurable/derived thresholds so scoring and drift detection align or
explicitly document the intended difference between absolute health vs relative
drift.

In `@eval/src/main.rs`:
- Around line 75-103: The trigger registration code calls
iii_arc.register_trigger for three critical triggers (cron -> function_id
"eval::drift", subscribe -> "eval::ingest", and http -> "eval::analyze_traces")
but only logs warnings on failure; change this to fail fast by treating
registration errors as fatal: propagate the Err out of main (or return an
Err/exit process) instead of continuing, e.g., replace the Err branches that do
tracing::warn!(...) with returning the error (or calling
anyhow::bail!/process::exit) and log at error level with context including the
register_trigger error and which RegisterTriggerInput failed (reference the
register_trigger calls and the RegisterTriggerInput structs/cront_expression) so
startup aborts when any of these critical triggers cannot be registered.
- Around line 62-71: register_worker() may fail to connect and its error is not
handled; change the code so you capture the Result from register_worker (instead
of unwrapping or ignoring errors), propagate or map the error into main()'s
Result with a clear message (e.g., using ? or map_err/anyhow::Context), only
create iii_arc and call functions::register_all(&iii_arc, &config) after a
successful register_worker return, and return an Err from main() with a
descriptive error when the connection to the III engine cannot be established.

---

Nitpick comments:
In `@eval/SPEC.md`:
- Around line 13-19: Several fenced code blocks in SPEC.md (e.g., the diagram
block starting "Your Workers → OTel spans → PubSub topic \"telemetry.spans\" →
eval::ingest" and the JSON-like block starting "Input:  { function_id: string,")
are missing language identifiers; update each triple-backtick fence to include
an appropriate language label (e.g., ```text for plain diagrams, ```json for
JSON examples, ```yaml for YAML snippets) for all affected blocks (also apply
same change to the ranges noted: 25-29, 35-48, 54-61, 67-76, 86-96, 102-106,
111-119, 125-133) so that syntax highlighting and rendering are correct.

In `@eval/src/config.rs`:
- Around line 51-54: After deserializing into EvalConfig in load_config,
validate semantic constraints before returning: check that
EvalConfig.drift_threshold is within 0.0..=1.0 and that
EvalConfig.max_spans_per_function is > 0 (and add any other domain rules
required by EvalConfig); if any check fails return an Err with a clear message
(e.g., using anyhow::bail or Result::Err with a descriptive string) instead of
Ok(config). Ensure the validation lives in load_config immediately after
serde_yaml::from_str so callers never receive an out-of-range EvalConfig.

In `@eval/src/functions/analyze.rs`:
- Around line 256-268: The current mapping that builds errors from stats_map
computes error_rate by dividing s.error_count by s.invocations (in the map that
produces the `errors: Vec<Value>`), but this relies on the invariant that
invocations > 0; add a defensive guard inside that closure used in the map
(referencing `stats_map`, the closure parameters `(fid, s)`, and
`s.invocations`) so that if `s.invocations == 0` you set error_rate to 0.0 (or
skip the entry) instead of performing the division, and then continue to
construct the JSON object with that guarded rate.
- Around line 108-118: The error mapping for the external call
iii.trigger(TriggerRequest { timeout_ms: Some(10_000), ... }) should distinguish
timeouts from other failures; update the map_err closure that produces
IIIError::Handler for traces_response to inspect the returned error (e.g., check
for an is_timeout() method or look for "timeout" in e.to_string()) and return a
clearer message like "timed out fetching traces from engine" for timeout cases
and keep the original "failed to fetch traces from engine: {e}" for other errors
so callers can tell timeout vs other failures.

In `@eval/src/functions/drift.rs`:
- Around line 53-67: In the loop over dimensions in drift.rs (the dimensions
array and code computing baseline_v, current_v and delta_pct), special-case the
"success_rate" dimension: instead of computing delta_pct as a relative change
using baseline_v (the existing delta_pct logic), compute an absolute change as
abs(current_v - baseline_v) and compare that to an absolute threshold (e.g.,
0.05 for 5 percentage points); keep the current relative delta_pct calculation
for the latency metrics ("p50_ms", "p95_ms", "p99_ms", "avg_duration_ms").
Ensure you reference the existing symbols baseline_val, baseline_v, current,
current_v, and delta_pct when adding this branch.

In `@eval/src/functions/ingest.rs`:
- Around line 21-26: The require_u64 helper (fn require_u64) currently returns a
generic "missing {field}" error; change it to distinguish between a missing key
and a present-but-wrong-type value by first checking payload.get(field): if None
return IIIError::Handler(format!("missing field '{}'", field)), else if
value.as_u64() is None return IIIError::Handler(format!("field '{}' has wrong
type: expected u64, found {}", field, value.type_name() or value.to_string())),
so callers can see whether the key was absent or contained a non-u64 value;
reference require_u64 and use payload.get(field) and value.as_u64() to implement
this branching.

In `@eval/src/functions/metrics.rs`:
- Around line 48-53: The code currently sets total = spans.len() while
avg_duration_ms divides by durations.len(), causing total_invocations (total) to
differ from the average's denominator if some spans lack duration_ms; update the
calculation to use a consistent count: compute a new valid_count =
durations.len() (or filter spans to count only those with duration_ms), use
valid_count for the average denominator and for total_invocations (or explicitly
document that total_invocations reflects all spans while average is over valid
durations). Locate the variables total, spans, durations.len(), avg_duration_ms
and total_invocations in metrics.rs and either set total = durations.len() (or
total = valid_count) or add a separate total_valid_durations and use that for
avg_duration_ms and output consistently.

In `@eval/src/functions/mod.rs`:
- Around line 74-93: The registration wrapper in mod.rs currently performs
function index management (checking fid, fetching via state::state_get,
deserializing, pushing fid, and persisting with state::state_set using
ingest::SCOPE_INDEX and ingest::INDEX_KEY); move this entire index-update block
into ingest::handle so related ingest/register logic lives together—remove the
index update from the registration wrapper and call or invoke the updated
ingest::handle path that performs: retrieve index with state::state_get, safely
deserialize to Vec<String>, check contains(&fid), push fid, and persist with
state::state_set, preserving the tracing::warn on deserialize or state_set
errors. Ensure you reference the same symbols (fid, state::state_get,
state::state_set, ingest::SCOPE_INDEX, ingest::INDEX_KEY, ingest::handle) and
keep behavior (no-op if fid empty) unchanged.

In `@eval/src/functions/report.rs`:
- Around line 35-42: Replace the silent .ok() drop of errors when calling
crate::functions::drift::handle so drift failures are observable: change the
handling around the drift_result (the block using has_baseline, drift_payload,
and the call to crate::functions::drift::handle(iii, config, drift_payload)) to
capture the Result, log the error via the existing logger or attach an error
string to the report (e.g., a drift_error field) instead of returning None, and
ensure success still yields the drift payload; this preserves the existing flow
but surfaces errors for observability.
- Around line 18-50: The per-function loop currently calls
crate::functions::drift::handle(...) for each fid; instead, call
crate::functions::drift::handle once outside the loop with a payload that omits
"function_id" to get drift results for all functions, then inside the loop
lookup the drift entry for each fid (e.g., map by function_id) when building
function_reports; adjust the existing variables (function_ids iteration,
existing/state_get logic, compute_metrics call) to use the single drift result
map rather than invoking drift::handle per fid and fall back to None when a fid
has no entry.

In `@eval/src/functions/score.rs`:
- Line 7: The handle function in eval/src/functions/score.rs currently hardcodes
thresholds (0.95, 5000ms, 0.80, 10000ms); change its signature to accept an
&EvalConfig (or a reference to it) and replace those literal values with the
corresponding fields from EvalConfig (e.g., drift_threshold and
timeout/threshold fields), and pass that config into drift::handle (which
already expects &EvalConfig); update any callers of score::handle to supply the
EvalConfig reference so scoring behavior becomes configurable.

In `@eval/src/manifest.rs`:
- Around line 17-23: The manifest hardcodes default_config JSON that duplicates
EvalConfig::default(), causing drift; replace the hardcoded serde_json::json!
block with a conversion from EvalConfig::default() (e.g., serialize or convert
EvalConfig::default() into a serde_json::Value) so default_config is derived
from EvalConfig::default() instead of duplicated; update the code that
constructs default_config (the variable named default_config in this diff) to
call EvalConfig::default() and serialize/convert it into the same JSON/Value
shape used elsewhere.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 82ebb290-77da-43ab-a529-29e62ee9cc33

📥 Commits

Reviewing files that changed from the base of the PR and between 606a4d4 and 4068e9c.

📒 Files selected for processing (17)

eval/Cargo.toml
eval/README.md
eval/SPEC.md
eval/build.rs
eval/config.yaml
eval/src/config.rs
eval/src/functions/analyze.rs
eval/src/functions/baseline.rs
eval/src/functions/drift.rs
eval/src/functions/ingest.rs
eval/src/functions/metrics.rs
eval/src/functions/mod.rs
eval/src/functions/report.rs
eval/src/functions/score.rs
eval/src/functions/state.rs
eval/src/main.rs
eval/src/manifest.rs

coderabbitai · 2026-04-07T18:33:25Z

eval/README.md

+```
+Options:
+  --config <PATH>    Path to config.yaml [default: ./config.yaml]
+  --url <URL>        WebSocket URL of the iii engine [default: ws://127.0.0.1:49134]
+  --manifest         Output module manifest as JSON and exit
+  -h, --help         Print help
+```


⚠️ Potential issue | 🟡 Minor

Add a language to the fenced options block.

At Line 40, the code fence has no language tag (MD040).

Proposed fix

-``` +```text Options: --config <PATH> Path to config.yaml [default: ./config.yaml] --url <URL> WebSocket URL of the iii engine [default: ws://127.0.0.1:49134] --manifest Output module manifest as JSON and exit -h, --help Print help

</details> <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.0)</summary> [warning] 40-40: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @eval/README.md around lines 40 - 46, The fenced code block that begins with
"Options:" is missing a language tag; update the opening fence from ``` to

"Options:" block to ```text) to satisfy MD040; leave the block contents and closing fence unchanged.

coderabbitai · 2026-04-07T18:33:25Z

eval/SPEC.md

+```
+eval:spans:{function_id}      — array of span objects (capped at max_spans_per_function)
+eval:baselines:{function_id}  — baseline metrics snapshot for drift comparison
+eval:function_index            — list of all tracked function IDs
+```


⚠️ Potential issue | 🟡 Minor

State scope key mismatch with implementation.

The spec says eval:function_index but the code (ingest.rs:10-11) uses:

SCOPE_INDEX = "eval:index"

INDEX_KEY = "function_list"

Update the spec to match the implementation.

Suggested fix

eval:spans:{function_id} — array of span objects (capped at max_spans_per_function)
eval:baselines:{function_id} — baseline metrics snapshot for drift comparison
-eval:function_index — list of all tracked function IDs
+eval:index:function_list — list of all tracked function IDs

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 25-25: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@eval/SPEC.md` around lines 25 - 29, Update the spec entry to match the implementation: replace the line that documents eval:function_index with the implemented scope and key by using the SCOPE_INDEX and INDEX_KEY names — i.e., document the state key as eval:index:function_list and the description as "list of all tracked function IDs" so the spec aligns with the constants SCOPE_INDEX and INDEX_KEY used in ingest.rs.

eval/SPEC.md

coderabbitai · 2026-04-07T18:33:26Z

eval/src/config.rs

+#[derive(Deserialize, Debug, Clone)]
+pub struct EvalConfig {
+    #[serde(default = "default_retention_hours")]
+    pub retention_hours: u64,
+    #[serde(default = "default_drift_threshold")]
+    pub drift_threshold: f64,
+    #[serde(default = "default_cron_drift_check")]
+    pub cron_drift_check: String,
+    #[serde(default = "default_max_spans_per_function")]
+    pub max_spans_per_function: usize,
+    #[allow(dead_code)]
+    #[serde(default = "default_baseline_window_minutes")]
+    pub baseline_window_minutes: u64,
+}


⚠️ Potential issue | 🟠 Major

Reject unknown config keys to avoid silent misconfiguration.

A typo in YAML keys will currently be ignored and defaults will be applied silently. Add deny_unknown_fields on EvalConfig so invalid config fails fast.

Proposed fix

-#[derive(Deserialize, Debug, Clone)] +#[derive(Deserialize, Debug, Clone)] +#[serde(deny_unknown_fields)] pub struct EvalConfig {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#[derive(Deserialize, Debug, Clone)]

pub struct EvalConfig {

#[serde(default = "default_retention_hours")]

pub retention_hours: u64,

#[serde(default = "default_drift_threshold")]

pub drift_threshold: f64,

#[serde(default = "default_cron_drift_check")]

pub cron_drift_check: String,

#[serde(default = "default_max_spans_per_function")]

pub max_spans_per_function: usize,

#[allow(dead_code)]

#[serde(default = "default_baseline_window_minutes")]

pub baseline_window_minutes: u64,

}

#[derive(Deserialize, Debug, Clone)]

#[serde(deny_unknown_fields)]

pub struct EvalConfig {

#[serde(default = "default_retention_hours")]

pub retention_hours: u64,

#[serde(default = "default_drift_threshold")]

pub drift_threshold: f64,

#[serde(default = "default_cron_drift_check")]

pub cron_drift_check: String,

#[serde(default = "default_max_spans_per_function")]

pub max_spans_per_function: usize,

#[allow(dead_code)]

#[serde(default = "default_baseline_window_minutes")]

pub baseline_window_minutes: u64,

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@eval/src/config.rs` around lines 4 - 17, The EvalConfig struct allows unknown/typoed config keys to be ignored; add serde's deny_unknown_fields attribute to EvalConfig so unknown YAML keys cause deserialization to fail fast. Modify the struct annotation for EvalConfig to include #[serde(deny_unknown_fields)] (in addition to the existing derives) so deserializing into EvalConfig will reject unexpected fields and surface configuration mistakes.

coderabbitai · 2026-04-07T18:33:26Z

eval/src/functions/baseline.rs

+    let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id).await.unwrap_or(json!(null));
+    let spans: Vec<Value> = if existing.is_array() {
+        serde_json::from_value(existing).unwrap_or_default()
+    } else {
+        Vec::new()
+    };


⚠️ Potential issue | 🟠 Major

Do not collapse state read failures into "no span data".

Line 13 currently turns all state_get errors into null, which can hide backend/read failures and produce misleading baseline responses.

Proposed fix

- let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id).await.unwrap_or(json!(null)); + let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id) + .await + .map_err(|e| IIIError::Handler(format!("failed to read spans for {function_id}: {e}")))?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id).await.unwrap_or(json!(null));

let spans: Vec<Value> = if existing.is_array() {

serde_json::from_value(existing).unwrap_or_default()

} else {

Vec::new()

};

let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id)

.await

.map_err(|e| IIIError::Handler(format!("failed to read spans for {function_id}: {e}")))?;

let spans: Vec<Value> = if existing.is_array() {

serde_json::from_value(existing).unwrap_or_default()

} else {

Vec::new()

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/baseline.rs` around lines 13 - 18, The code currently collapses all state_get failures into json!(null) which hides backend/read errors; change the state read to explicitly handle the Result from state_get (do not use unwrap_or(json!(null))). Match on state_get(...).await: on Err(e) propagate or return the error (or log and return a failure response) so callers know the read failed; on Ok(existing) then check if existing.is_array() and parse the array into spans (using serde_json::from_value on existing) and treat Null or non-array as empty Vec. Update handling around the variables state_get, SCOPE_SPANS, function_id, existing, and spans so read errors are not converted into "no span data".

coderabbitai · 2026-04-07T18:33:26Z

eval/src/functions/score.rs

+        if success_rate < 0.95 {
+            let penalty = (0.95 - success_rate) * 200.0;
+            fn_score -= penalty;
+            issues.push(json!({
+                "function_id": fid,
+                "issue": "low_success_rate",
+                "value": success_rate,
+            }));
+        }
+
+        if p99 > 5000 {
+            let penalty = ((p99 as f64 - 5000.0) / 1000.0).min(30.0);
+            fn_score -= penalty;
+            issues.push(json!({
+                "function_id": fid,
+                "issue": "high_p99_latency",
+                "value_ms": p99,
+            }));
+        }


⚠️ Potential issue | 🟡 Minor

Hardcoded scoring thresholds may conflict with drift detection logic.

The scoring penalties use absolute thresholds (success_rate < 0.95, p99 > 5000ms), while drift detection (per context snippet from drift.rs:53-79) uses a configurable relative threshold (default 15%). This can produce contradictory signals:

A function with 94% success rate triggers a score penalty but may not trigger drift if baseline was 93%.

A function degrading from 99% to 86% (13% relative change) won't trigger drift but will trigger scoring issues.

Consider aligning the thresholds or documenting the intentional difference between "absolute health" (score) vs "relative change" (drift).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/score.rs` around lines 56 - 74, Scoring uses hardcoded absolute thresholds (success_rate < 0.95, p99 > 5000) which can conflict with the relative drift logic (15% default); change the scoring in the function that adjusts fn_score (the block referencing fn_score, success_rate, p99, and issues.push) to use configurable thresholds (e.g., success_rate_threshold and p99_threshold) or compute thresholds from the same relative baseline used by drift detection, then read those values from the same config/params used by drift.rs (or surface them in the shared config struct), and update issue payloads/tests/docs to reflect the configurable/derived thresholds so scoring and drift detection align or explicitly document the intended difference between absolute health vs relative drift.

coderabbitai · 2026-04-07T18:33:26Z

eval/src/main.rs

+    let iii = register_worker(
+        &cli.url,
+        InitOptions {
+            otel: Some(OtelConfig::default()),
+            ..Default::default()
+        },
+    );
+
+    let iii_arc = Arc::new(iii);
+    functions::register_all(&iii_arc, &config);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the return type and error handling of register_worker in iii_sdk # to verify if it can fail and how errors are propagated # Search for register_worker function definition or usage patterns rg -n "register_worker" --type rust -C 5 # Look for Result types in the SDK imports or worker patterns ast-grep --pattern 'fn register_worker($$$) -> $RET { $$$ }'

Repository: iii-hq/workers

Length of output: 1869

🏁 Script executed:

find . -name "*.rs" -type f | xargs grep -l "pub fn register_worker\|pub async fn register_worker" | head -20

Repository: iii-hq/workers

Length of output: 40

🏁 Script executed:

cat Cargo.toml | grep -A 5 "\[dependencies\]" | head -20

Repository: iii-hq/workers

Length of output: 100

🏁 Script executed:

# Check if iii_sdk is local or external ls -la | grep -E "iii_sdk|Cargo"

Repository: iii-hq/workers

Length of output: 40

🏁 Script executed:

# Look at other main.rs files to see if there's a pattern for error handling fd "main\.rs" --type f | xargs rg -l "register_worker" | head -10

Repository: iii-hq/workers

Length of output: 99

🏁 Script executed:

find . -name "Cargo.toml" -type f | head -20

Repository: iii-hq/workers

Length of output: 101

🏁 Script executed:

# Check for iii_sdk in git or as a path dependency rg "iii_sdk" --type toml -C 3

Repository: iii-hq/workers

Length of output: 40

🏁 Script executed:

# Look for documentation or comments about register_worker behavior rg "register_worker" --type rust -C 10 | head -100

Repository: iii-hq/workers

Length of output: 3243

🏁 Script executed:

cat eval/Cargo.toml

Repository: iii-hq/workers

Length of output: 641

🏁 Script executed:

cat image-resize/Cargo.toml

Repository: iii-hq/workers

Length of output: 709

🏁 Script executed:

# Check if main() function has special error handling or try-catch rg "fn main|async fn main" --type rust -A 20 eval/src/main.rs | head -40

Repository: iii-hq/workers

Length of output: 742

🏁 Script executed:

# Check if there are any panic catches or error recovery patterns rg "panic|catch_unwind|std::panic" --type rust

Repository: iii-hq/workers

Length of output: 150

🏁 Script executed:

# Search for any comments or documentation about register_worker behavior rg "register_worker" --type rust --type md --type txt -B 5 -A 5

Repository: iii-hq/workers

Length of output: 1771

🏁 Script executed:

# Look for test files that might show register_worker behavior find . -name "*test*" -o -name "*spec*" | grep -E "\.rs$"

Repository: iii-hq/workers

Length of output: 40

🏁 Script executed:

# Check how 'iii' is used after register_worker to infer its type/behavior rg "let iii = register_worker" --type rust -A 20

Repository: iii-hq/workers

Length of output: 2408

🏁 Script executed:

# Look for any README or documentation files find . -name "README*" -o -name "*.md" | head -10

Repository: iii-hq/workers

Length of output: 134

🏁 Script executed:

cat README.md

Repository: iii-hq/workers

Length of output: 1591

🏁 Script executed:

cat eval/README.md

Repository: iii-hq/workers

Length of output: 1933

Add error handling for register_worker connection failures.

The register_worker call connects to the III engine via WebSocket synchronously, but the connection error is not exposed for handling. If the engine is unreachable at startup, the behavior is unclear (likely panic or hang). Since main() returns Result<()>, consider wrapping the connection attempt in error handling to enable graceful startup failure with a clear error message when the engine is unavailable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@eval/src/main.rs` around lines 62 - 71, register_worker() may fail to connect and its error is not handled; change the code so you capture the Result from register_worker (instead of unwrapping or ignoring errors), propagate or map the error into main()'s Result with a clear message (e.g., using ? or map_err/anyhow::Context), only create iii_arc and call functions::register_all(&iii_arc, &config) after a successful register_worker return, and return an Err from main() with a descriptive error when the connection to the III engine cannot be established.

coderabbitai · 2026-04-07T18:33:26Z

eval/src/main.rs

+    match iii_arc.register_trigger(RegisterTriggerInput {
+        trigger_type: "cron".to_string(),
+        function_id: "eval::drift".to_string(),
+        config: json!({ "expression": cron_expression }),
+    }) {
+        Ok(_) => tracing::info!("cron trigger registered for eval::drift"),
+        Err(e) => tracing::warn!(error = %e, "failed to register cron trigger"),
+    }
+
+    match iii_arc.register_trigger(RegisterTriggerInput {
+        trigger_type: "subscribe".to_string(),
+        function_id: "eval::ingest".to_string(),
+        config: json!({ "topic": "telemetry.spans" }),
+    }) {
+        Ok(_) => tracing::info!("subscribe trigger registered for eval::ingest on telemetry.spans"),
+        Err(e) => tracing::warn!(error = %e, "failed to register subscribe trigger"),
+    }
+
+    match iii_arc.register_trigger(RegisterTriggerInput {
+        trigger_type: "http".to_string(),
+        function_id: "eval::analyze_traces".to_string(),
+        config: json!({
+            "api_path": "eval/analyze",
+            "http_method": "POST"
+        }),
+    }) {
+        Ok(_) => tracing::info!("http trigger registered for eval::analyze_traces"),
+        Err(e) => tracing::warn!(error = %e, "failed to register http trigger for analyze_traces"),
+    }


⚠️ Potential issue | 🟠 Major

Trigger registration failures silently degrade the worker.

All three trigger registrations log warnings on failure but allow the worker to continue. This means the worker could report "ready" while being non-functional (e.g., no span ingestion if subscribe trigger fails, no drift detection if cron fails).

Consider either:

Treating trigger registration failures as fatal errors that prevent startup

Exposing registration status via a health check endpoint

At minimum, logging at error level rather than warn since these are critical capabilities

🛠️ Proposed fix: Fail fast on critical trigger registration failures

- match iii_arc.register_trigger(RegisterTriggerInput { + iii_arc.register_trigger(RegisterTriggerInput { trigger_type: "cron".to_string(), function_id: "eval::drift".to_string(), config: json!({ "expression": cron_expression }), - }) { - Ok(_) => tracing::info!("cron trigger registered for eval::drift"), - Err(e) => tracing::warn!(error = %e, "failed to register cron trigger"), - } + }).map_err(|e| { + tracing::error!(error = %e, "failed to register cron trigger"); + e + })?; + tracing::info!("cron trigger registered for eval::drift"); - match iii_arc.register_trigger(RegisterTriggerInput { + iii_arc.register_trigger(RegisterTriggerInput { trigger_type: "subscribe".to_string(), function_id: "eval::ingest".to_string(), config: json!({ "topic": "telemetry.spans" }), - }) { - Ok(_) => tracing::info!("subscribe trigger registered for eval::ingest on telemetry.spans"), - Err(e) => tracing::warn!(error = %e, "failed to register subscribe trigger"), - } + }).map_err(|e| { + tracing::error!(error = %e, "failed to register subscribe trigger"); + e + })?; + tracing::info!("subscribe trigger registered for eval::ingest on telemetry.spans"); - match iii_arc.register_trigger(RegisterTriggerInput { + iii_arc.register_trigger(RegisterTriggerInput { trigger_type: "http".to_string(), function_id: "eval::analyze_traces".to_string(), config: json!({ "api_path": "eval/analyze", "http_method": "POST" }), - }) { - Ok(_) => tracing::info!("http trigger registered for eval::analyze_traces"), - Err(e) => tracing::warn!(error = %e, "failed to register http trigger for analyze_traces"), - } + }).map_err(|e| { + tracing::error!(error = %e, "failed to register http trigger for analyze_traces"); + e + })?; + tracing::info!("http trigger registered for eval::analyze_traces");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@eval/src/main.rs` around lines 75 - 103, The trigger registration code calls iii_arc.register_trigger for three critical triggers (cron -> function_id "eval::drift", subscribe -> "eval::ingest", and http -> "eval::analyze_traces") but only logs warnings on failure; change this to fail fast by treating registration errors as fatal: propagate the Err out of main (or return an Err/exit process) instead of continuing, e.g., replace the Err branches that do tracing::warn!(...) with returning the error (or calling anyhow::bail!/process::exit) and log at error level with context including the register_trigger error and which RegisterTriggerInput failed (reference the register_trigger calls and the RegisterTriggerInput structs/cront_expression) so startup aborts when any of these critical triggers cannot be registered.

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

rohitg00 force-pushed the feat/eval branch from 4068e9c to 11a5a84 Compare April 7, 2026 18:36

Conversation

rohitg00 commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Functions

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rohitg00 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading