Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 8 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (17)
📝 WalkthroughWalkthroughA new evaluation worker ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant III as III Engine
participant PubSub
participant State
participant Handlers
Note over Client,Handlers: Span Ingestion Flow
PubSub->>III: Publish telemetry.spans
III->>Handlers: Trigger eval::ingest
Handlers->>State: Load existing spans
Handlers->>State: Append new span & cap
State-->>Handlers: Confirm stored
Handlers-->>III: Return result
Note over Client,Handlers: Metrics & Scoring Flow
Client->>III: POST /eval/metrics
III->>Handlers: Trigger eval::metrics
Handlers->>State: Load function spans
Handlers->>Handlers: Compute p50/p95/p99, rates
Handlers-->>III: Return metrics JSON
Client->>III: POST /eval/score
III->>Handlers: Trigger eval::score
Handlers->>State: Load all function spans
Handlers->>Handlers: Aggregate & penalize scores
Handlers-->>III: Return health score
sequenceDiagram
participant Client
participant III as III Engine
participant State
participant Handlers
participant Cron
Note over Client,Handlers: Baseline & Drift Detection
Client->>III: POST /eval/baseline?function_id=f1
III->>Handlers: Trigger eval::baseline
Handlers->>State: Load spans for f1
Handlers->>Handlers: Compute current metrics
Handlers->>State: Store baseline snapshot
State-->>Handlers: Confirm
Handlers-->>III: Return saved baseline
Cron->>III: Trigger (cron: every 10min)
III->>Handlers: Trigger eval::drift
Handlers->>State: Load baseline for each function
Handlers->>State: Load current spans
Handlers->>Handlers: Compare metrics dimensions
Handlers->>Handlers: Calculate delta percentages
Handlers-->>III: Return drift results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
OTel-native evaluation worker for iii engine. 7 functions: - eval::ingest — receive span data via PubSub - eval::metrics — P50/P95/P99 latency, success rate, throughput - eval::score — system-wide health score (0-100) - eval::drift — detect metric drift against baselines - eval::baseline — snapshot current metrics - eval::report — full report with metrics + score + drift - eval::analyze_traces — read OTel traces, find patterns, generate insights Uses iii primitives: State (spans, baselines), Cron (drift check), Subscribe (telemetry.spans), HTTP triggers. 19 tests, 0 warnings.
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (12)
eval/src/manifest.rs (1)
17-23: Avoid duplicating defaults between manifest and runtime config.
default_configis hardcoded here; it can drift fromEvalConfig::default()over time.Proposed fix
+use crate::config::EvalConfig; + pub fn build_manifest() -> ModuleManifest { ModuleManifest { name: env!("CARGO_PKG_NAME").to_string(), version: env!("CARGO_PKG_VERSION").to_string(), description: "III engine OTel-native evaluation worker".to_string(), - default_config: serde_json::json!({ - "retention_hours": 24, - "drift_threshold": 0.15, - "cron_drift_check": "0 */10 * * * *", - "max_spans_per_function": 1000, - "baseline_window_minutes": 60 - }), + default_config: serde_json::to_value(EvalConfig::default()) + .expect("EvalConfig should always serialize"), supported_targets: vec![env!("TARGET").to_string()], } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/manifest.rs` around lines 17 - 23, The manifest hardcodes default_config JSON that duplicates EvalConfig::default(), causing drift; replace the hardcoded serde_json::json! block with a conversion from EvalConfig::default() (e.g., serialize or convert EvalConfig::default() into a serde_json::Value) so default_config is derived from EvalConfig::default() instead of duplicated; update the code that constructs default_config (the variable named default_config in this diff) to call EvalConfig::default() and serialize/convert it into the same JSON/Value shape used elsewhere.eval/src/config.rs (1)
51-54: Validate semantic ranges after deserialization.
load_configshould enforce constraints (for example:drift_thresholdin[0.0, 1.0], non-zeromax_spans_per_function) before returning config.Proposed fix
pub fn load_config(path: &str) -> Result<EvalConfig> { let contents = std::fs::read_to_string(path)?; let config: EvalConfig = serde_yaml::from_str(&contents)?; + if !(0.0..=1.0).contains(&config.drift_threshold) { + anyhow::bail!("drift_threshold must be between 0.0 and 1.0"); + } + if config.max_spans_per_function == 0 { + anyhow::bail!("max_spans_per_function must be > 0"); + } Ok(config) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/config.rs` around lines 51 - 54, After deserializing into EvalConfig in load_config, validate semantic constraints before returning: check that EvalConfig.drift_threshold is within 0.0..=1.0 and that EvalConfig.max_spans_per_function is > 0 (and add any other domain rules required by EvalConfig); if any check fails return an Err with a clear message (e.g., using anyhow::bail or Result::Err with a descriptive string) instead of Ok(config). Ensure the validation lives in load_config immediately after serde_yaml::from_str so callers never receive an out-of-range EvalConfig.eval/src/functions/report.rs (2)
35-42: Silent error swallowing may hide drift detection failures.Using
.ok()to ignore drift handler errors means failures (e.g., state read errors, serialization issues) are silently discarded. Consider logging drift failures or including adrift_errorfield in the report for observability.Suggested improvement
let drift_result = if has_baseline { let drift_payload = json!({ "function_id": fid }); - crate::functions::drift::handle(iii, config, drift_payload) - .await - .ok() + match crate::functions::drift::handle(iii, config, drift_payload).await { + Ok(result) => Some(result), + Err(e) => { + tracing::warn!(function_id = %fid, error = %e, "drift check failed"); + None + } + } } else { None };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/report.rs` around lines 35 - 42, Replace the silent .ok() drop of errors when calling crate::functions::drift::handle so drift failures are observable: change the handling around the drift_result (the block using has_baseline, drift_payload, and the call to crate::functions::drift::handle(iii, config, drift_payload)) to capture the Result, log the error via the existing logger or attach an error string to the report (e.g., a drift_error field) instead of returning None, and ensure success still yields the drift payload; this preserves the existing flow but surfaces errors for observability.
18-50: Per-function drift calls are inefficient when checking all functions.The loop calls
drift::handleindividually for each function with a baseline. However,drift::handlealready supports processing all functions whenfunction_idis omitted from the payload. Consider a single drift call for the full report, then matching results by function_id.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/report.rs` around lines 18 - 50, The per-function loop currently calls crate::functions::drift::handle(...) for each fid; instead, call crate::functions::drift::handle once outside the loop with a payload that omits "function_id" to get drift results for all functions, then inside the loop lookup the drift entry for each fid (e.g., map by function_id) when building function_reports; adjust the existing variables (function_ids iteration, existing/state_get logic, compute_metrics call) to use the single drift result map rather than invoking drift::handle per fid and fall back to None when a fid has no entry.eval/src/functions/ingest.rs (1)
21-26: Error message could be more descriptive.The error message "missing {field}" doesn't indicate whether the field was absent or had wrong type. Consider distinguishing these cases.
Suggested improvement
fn require_u64(payload: &Value, field: &str) -> Result<u64, IIIError> { - payload - .get(field) - .and_then(|v| v.as_u64()) - .ok_or_else(|| IIIError::Handler(format!("missing {field}"))) + match payload.get(field) { + Some(v) => v.as_u64().ok_or_else(|| IIIError::Handler(format!("{field} must be a non-negative integer"))), + None => Err(IIIError::Handler(format!("missing required field: {field}"))), + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/ingest.rs` around lines 21 - 26, The require_u64 helper (fn require_u64) currently returns a generic "missing {field}" error; change it to distinguish between a missing key and a present-but-wrong-type value by first checking payload.get(field): if None return IIIError::Handler(format!("missing field '{}'", field)), else if value.as_u64() is None return IIIError::Handler(format!("field '{}' has wrong type: expected u64, found {}", field, value.type_name() or value.to_string())), so callers can see whether the key was absent or contained a non-u64 value; reference require_u64 and use payload.get(field) and value.as_u64() to implement this branching.eval/src/functions/metrics.rs (1)
48-53: Potential mismatch betweentotalanddurations.len().
totalis set tospans.len()butavg_duration_msdivides bydurations.len(). If some spans are missingduration_ms, these counts differ. This is intentional (average only over valid durations), buttotal_invocationsin the output may not match the denominator used for average. Consider documenting this or using consistent counts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/metrics.rs` around lines 48 - 53, The code currently sets total = spans.len() while avg_duration_ms divides by durations.len(), causing total_invocations (total) to differ from the average's denominator if some spans lack duration_ms; update the calculation to use a consistent count: compute a new valid_count = durations.len() (or filter spans to count only those with duration_ms), use valid_count for the average denominator and for total_invocations (or explicitly document that total_invocations reflects all spans while average is over valid durations). Locate the variables total, spans, durations.len(), avg_duration_ms and total_invocations in metrics.rs and either set total = durations.len() (or total = valid_count) or add a separate total_valid_durations and use that for avg_duration_ms and output consistently.eval/src/functions/analyze.rs (2)
256-268: Division guard relies on filter correctness.Line 260 divides by
s.invocationsafter filtering fors.error_count > 0. This is safe becauseerror_countcan only be incremented wheninvocationsis also incremented (lines 159, 167), but the implicit coupling could be fragile. A defensive guard would be safer.Optional defensive guard
let errors: Vec<Value> = stats_map .iter() - .filter(|(_, s)| s.error_count > 0) + .filter(|(_, s)| s.error_count > 0 && s.invocations > 0) .map(|(fid, s)| {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/analyze.rs` around lines 256 - 268, The current mapping that builds errors from stats_map computes error_rate by dividing s.error_count by s.invocations (in the map that produces the `errors: Vec<Value>`), but this relies on the invariant that invocations > 0; add a defensive guard inside that closure used in the map (referencing `stats_map`, the closure parameters `(fid, s)`, and `s.invocations`) so that if `s.invocations == 0` you set error_rate to 0.0 (or skip the entry) instead of performing the division, and then continue to construct the JSON object with that guarded rate.
108-118: External trigger call should handle timeout gracefully.The 10-second timeout is reasonable, but the error message "failed to fetch traces from engine" doesn't distinguish between timeout vs other failures. Consider enriching the error context.
Suggested improvement
.await .map_err(|e| { - IIIError::Handler(format!("failed to fetch traces from engine: {e}")) + IIIError::Handler(format!("failed to fetch traces from engine::traces::list: {e}")) })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/analyze.rs` around lines 108 - 118, The error mapping for the external call iii.trigger(TriggerRequest { timeout_ms: Some(10_000), ... }) should distinguish timeouts from other failures; update the map_err closure that produces IIIError::Handler for traces_response to inspect the returned error (e.g., check for an is_timeout() method or look for "timeout" in e.to_string()) and return a clearer message like "timed out fetching traces from engine" for timeout cases and keep the original "failed to fetch traces from engine: {e}" for other errors so callers can tell timeout vs other failures.eval/src/functions/mod.rs (1)
74-93: Index update logic could be moved intoingest::handle.The function index update (checking if
fidexists, appending if not) is performed in the registration wrapper rather than iningest::handle. This splits related logic across two locations. Consider moving this intoingest::handlefor better cohesion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/mod.rs` around lines 74 - 93, The registration wrapper in mod.rs currently performs function index management (checking fid, fetching via state::state_get, deserializing, pushing fid, and persisting with state::state_set using ingest::SCOPE_INDEX and ingest::INDEX_KEY); move this entire index-update block into ingest::handle so related ingest/register logic lives together—remove the index update from the registration wrapper and call or invoke the updated ingest::handle path that performs: retrieve index with state::state_get, safely deserialize to Vec<String>, check contains(&fid), push fid, and persist with state::state_set, preserving the tracing::warn on deserialize or state_set errors. Ensure you reference the same symbols (fid, state::state_get, state::state_set, ingest::SCOPE_INDEX, ingest::INDEX_KEY, ingest::handle) and keep behavior (no-op if fid empty) unchanged.eval/src/functions/score.rs (1)
7-7: Consider acceptingEvalConfigfor configurable thresholds.The
handlefunction uses hardcoded thresholds (0.95, 5000ms, 0.80, 10000ms) whiledrift::handletakes&EvalConfigfor its configurabledrift_threshold. For consistency and operational flexibility, the scoring thresholds should also be configurable.Suggested signature change
-pub async fn handle(iii: &Arc<III>, _payload: Value) -> Result<Value, IIIError> { +pub async fn handle(iii: &Arc<III>, config: &EvalConfig, _payload: Value) -> Result<Value, IIIError> { + let success_threshold = config.score_success_threshold.unwrap_or(0.95); + let p99_threshold_ms = config.score_p99_threshold_ms.unwrap_or(5000);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/score.rs` at line 7, The handle function in eval/src/functions/score.rs currently hardcodes thresholds (0.95, 5000ms, 0.80, 10000ms); change its signature to accept an &EvalConfig (or a reference to it) and replace those literal values with the corresponding fields from EvalConfig (e.g., drift_threshold and timeout/threshold fields), and pass that config into drift::handle (which already expects &EvalConfig); update any callers of score::handle to supply the EvalConfig reference so scoring behavior becomes configurable.eval/SPEC.md (1)
13-19: Add language identifiers to fenced code blocks.Static analysis flagged multiple code blocks without language specifiers. Add appropriate identifiers for better rendering and syntax highlighting.
Examples
-``` +```text Your Workers → OTel spans → PubSub topic "telemetry.spans" → eval::ingest-``` +```json Input: { function_id: string,Also applies to: 25-29, 35-48, 54-61, 67-76, 86-96, 102-106, 111-119, 125-133
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/SPEC.md` around lines 13 - 19, Several fenced code blocks in SPEC.md (e.g., the diagram block starting "Your Workers → OTel spans → PubSub topic \"telemetry.spans\" → eval::ingest" and the JSON-like block starting "Input: { function_id: string,") are missing language identifiers; update each triple-backtick fence to include an appropriate language label (e.g., ```text for plain diagrams, ```json for JSON examples, ```yaml for YAML snippets) for all affected blocks (also apply same change to the ranges noted: 25-29, 35-48, 54-61, 67-76, 86-96, 102-106, 111-119, 125-133) so that syntax highlighting and rendering are correct.eval/src/functions/drift.rs (1)
53-67: Relative threshold may be inappropriate for success_rate.A 15% relative change in success_rate means:
- 100% → 85% triggers drift (reasonable)
- 90% → 76.5% triggers drift (reasonable)
- 50% → 42.5% triggers drift (but 50% success rate is already critical)
Consider using absolute thresholds for success_rate (e.g., any drop > 5 percentage points) while keeping relative thresholds for latency metrics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eval/src/functions/drift.rs` around lines 53 - 67, In the loop over dimensions in drift.rs (the dimensions array and code computing baseline_v, current_v and delta_pct), special-case the "success_rate" dimension: instead of computing delta_pct as a relative change using baseline_v (the existing delta_pct logic), compute an absolute change as abs(current_v - baseline_v) and compare that to an absolute threshold (e.g., 0.05 for 5 percentage points); keep the current relative delta_pct calculation for the latency metrics ("p50_ms", "p95_ms", "p99_ms", "avg_duration_ms"). Ensure you reference the existing symbols baseline_val, baseline_v, current, current_v, and delta_pct when adding this branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@eval/README.md`:
- Around line 40-46: The fenced code block that begins with "Options:" is
missing a language tag; update the opening fence from ``` to ```text so the
block is marked (e.g., change the code fence before the "Options:" block to
```text) to satisfy MD040; leave the block contents and closing fence unchanged.
In `@eval/SPEC.md`:
- Line 31: The spec lists 6 functions but the implementation registers an extra
function eval::analyze_traces; add documentation for analyze_traces to
eval/SPEC.md (immediately after the eval::report section) using the suggested
block: specify Input: {limit?: integer, function_filter?: string}, Output: {
summary: {total_spans, unique_functions, time_range, error_rate},
slowest_functions: [...], most_active: [...], errors: [...], insights: [string]
} and a short description: "Reads OTel traces from the engine, aggregates
per-function stats, and generates insights about error rates and slow
functions." Ensure the new entry matches the formatting of the other function
docs so the total function count is consistent.
- Around line 25-29: Update the spec entry to match the implementation: replace
the line that documents eval:function_index with the implemented scope and key
by using the SCOPE_INDEX and INDEX_KEY names — i.e., document the state key as
eval:index:function_list and the description as "list of all tracked function
IDs" so the spec aligns with the constants SCOPE_INDEX and INDEX_KEY used in
ingest.rs.
In `@eval/src/config.rs`:
- Around line 4-17: The EvalConfig struct allows unknown/typoed config keys to
be ignored; add serde's deny_unknown_fields attribute to EvalConfig so unknown
YAML keys cause deserialization to fail fast. Modify the struct annotation for
EvalConfig to include #[serde(deny_unknown_fields)] (in addition to the existing
derives) so deserializing into EvalConfig will reject unexpected fields and
surface configuration mistakes.
In `@eval/src/functions/baseline.rs`:
- Around line 13-18: The code currently collapses all state_get failures into
json!(null) which hides backend/read errors; change the state read to explicitly
handle the Result from state_get (do not use unwrap_or(json!(null))). Match on
state_get(...).await: on Err(e) propagate or return the error (or log and return
a failure response) so callers know the read failed; on Ok(existing) then check
if existing.is_array() and parse the array into spans (using
serde_json::from_value on existing) and treat Null or non-array as empty Vec.
Update handling around the variables state_get, SCOPE_SPANS, function_id,
existing, and spans so read errors are not converted into "no span data".
In `@eval/src/functions/score.rs`:
- Around line 56-74: Scoring uses hardcoded absolute thresholds (success_rate <
0.95, p99 > 5000) which can conflict with the relative drift logic (15%
default); change the scoring in the function that adjusts fn_score (the block
referencing fn_score, success_rate, p99, and issues.push) to use configurable
thresholds (e.g., success_rate_threshold and p99_threshold) or compute
thresholds from the same relative baseline used by drift detection, then read
those values from the same config/params used by drift.rs (or surface them in
the shared config struct), and update issue payloads/tests/docs to reflect the
configurable/derived thresholds so scoring and drift detection align or
explicitly document the intended difference between absolute health vs relative
drift.
In `@eval/src/main.rs`:
- Around line 75-103: The trigger registration code calls
iii_arc.register_trigger for three critical triggers (cron -> function_id
"eval::drift", subscribe -> "eval::ingest", and http -> "eval::analyze_traces")
but only logs warnings on failure; change this to fail fast by treating
registration errors as fatal: propagate the Err out of main (or return an
Err/exit process) instead of continuing, e.g., replace the Err branches that do
tracing::warn!(...) with returning the error (or calling
anyhow::bail!/process::exit) and log at error level with context including the
register_trigger error and which RegisterTriggerInput failed (reference the
register_trigger calls and the RegisterTriggerInput structs/cront_expression) so
startup aborts when any of these critical triggers cannot be registered.
- Around line 62-71: register_worker() may fail to connect and its error is not
handled; change the code so you capture the Result from register_worker (instead
of unwrapping or ignoring errors), propagate or map the error into main()'s
Result with a clear message (e.g., using ? or map_err/anyhow::Context), only
create iii_arc and call functions::register_all(&iii_arc, &config) after a
successful register_worker return, and return an Err from main() with a
descriptive error when the connection to the III engine cannot be established.
---
Nitpick comments:
In `@eval/SPEC.md`:
- Around line 13-19: Several fenced code blocks in SPEC.md (e.g., the diagram
block starting "Your Workers → OTel spans → PubSub topic \"telemetry.spans\" →
eval::ingest" and the JSON-like block starting "Input: { function_id: string,")
are missing language identifiers; update each triple-backtick fence to include
an appropriate language label (e.g., ```text for plain diagrams, ```json for
JSON examples, ```yaml for YAML snippets) for all affected blocks (also apply
same change to the ranges noted: 25-29, 35-48, 54-61, 67-76, 86-96, 102-106,
111-119, 125-133) so that syntax highlighting and rendering are correct.
In `@eval/src/config.rs`:
- Around line 51-54: After deserializing into EvalConfig in load_config,
validate semantic constraints before returning: check that
EvalConfig.drift_threshold is within 0.0..=1.0 and that
EvalConfig.max_spans_per_function is > 0 (and add any other domain rules
required by EvalConfig); if any check fails return an Err with a clear message
(e.g., using anyhow::bail or Result::Err with a descriptive string) instead of
Ok(config). Ensure the validation lives in load_config immediately after
serde_yaml::from_str so callers never receive an out-of-range EvalConfig.
In `@eval/src/functions/analyze.rs`:
- Around line 256-268: The current mapping that builds errors from stats_map
computes error_rate by dividing s.error_count by s.invocations (in the map that
produces the `errors: Vec<Value>`), but this relies on the invariant that
invocations > 0; add a defensive guard inside that closure used in the map
(referencing `stats_map`, the closure parameters `(fid, s)`, and
`s.invocations`) so that if `s.invocations == 0` you set error_rate to 0.0 (or
skip the entry) instead of performing the division, and then continue to
construct the JSON object with that guarded rate.
- Around line 108-118: The error mapping for the external call
iii.trigger(TriggerRequest { timeout_ms: Some(10_000), ... }) should distinguish
timeouts from other failures; update the map_err closure that produces
IIIError::Handler for traces_response to inspect the returned error (e.g., check
for an is_timeout() method or look for "timeout" in e.to_string()) and return a
clearer message like "timed out fetching traces from engine" for timeout cases
and keep the original "failed to fetch traces from engine: {e}" for other errors
so callers can tell timeout vs other failures.
In `@eval/src/functions/drift.rs`:
- Around line 53-67: In the loop over dimensions in drift.rs (the dimensions
array and code computing baseline_v, current_v and delta_pct), special-case the
"success_rate" dimension: instead of computing delta_pct as a relative change
using baseline_v (the existing delta_pct logic), compute an absolute change as
abs(current_v - baseline_v) and compare that to an absolute threshold (e.g.,
0.05 for 5 percentage points); keep the current relative delta_pct calculation
for the latency metrics ("p50_ms", "p95_ms", "p99_ms", "avg_duration_ms").
Ensure you reference the existing symbols baseline_val, baseline_v, current,
current_v, and delta_pct when adding this branch.
In `@eval/src/functions/ingest.rs`:
- Around line 21-26: The require_u64 helper (fn require_u64) currently returns a
generic "missing {field}" error; change it to distinguish between a missing key
and a present-but-wrong-type value by first checking payload.get(field): if None
return IIIError::Handler(format!("missing field '{}'", field)), else if
value.as_u64() is None return IIIError::Handler(format!("field '{}' has wrong
type: expected u64, found {}", field, value.type_name() or value.to_string())),
so callers can see whether the key was absent or contained a non-u64 value;
reference require_u64 and use payload.get(field) and value.as_u64() to implement
this branching.
In `@eval/src/functions/metrics.rs`:
- Around line 48-53: The code currently sets total = spans.len() while
avg_duration_ms divides by durations.len(), causing total_invocations (total) to
differ from the average's denominator if some spans lack duration_ms; update the
calculation to use a consistent count: compute a new valid_count =
durations.len() (or filter spans to count only those with duration_ms), use
valid_count for the average denominator and for total_invocations (or explicitly
document that total_invocations reflects all spans while average is over valid
durations). Locate the variables total, spans, durations.len(), avg_duration_ms
and total_invocations in metrics.rs and either set total = durations.len() (or
total = valid_count) or add a separate total_valid_durations and use that for
avg_duration_ms and output consistently.
In `@eval/src/functions/mod.rs`:
- Around line 74-93: The registration wrapper in mod.rs currently performs
function index management (checking fid, fetching via state::state_get,
deserializing, pushing fid, and persisting with state::state_set using
ingest::SCOPE_INDEX and ingest::INDEX_KEY); move this entire index-update block
into ingest::handle so related ingest/register logic lives together—remove the
index update from the registration wrapper and call or invoke the updated
ingest::handle path that performs: retrieve index with state::state_get, safely
deserialize to Vec<String>, check contains(&fid), push fid, and persist with
state::state_set, preserving the tracing::warn on deserialize or state_set
errors. Ensure you reference the same symbols (fid, state::state_get,
state::state_set, ingest::SCOPE_INDEX, ingest::INDEX_KEY, ingest::handle) and
keep behavior (no-op if fid empty) unchanged.
In `@eval/src/functions/report.rs`:
- Around line 35-42: Replace the silent .ok() drop of errors when calling
crate::functions::drift::handle so drift failures are observable: change the
handling around the drift_result (the block using has_baseline, drift_payload,
and the call to crate::functions::drift::handle(iii, config, drift_payload)) to
capture the Result, log the error via the existing logger or attach an error
string to the report (e.g., a drift_error field) instead of returning None, and
ensure success still yields the drift payload; this preserves the existing flow
but surfaces errors for observability.
- Around line 18-50: The per-function loop currently calls
crate::functions::drift::handle(...) for each fid; instead, call
crate::functions::drift::handle once outside the loop with a payload that omits
"function_id" to get drift results for all functions, then inside the loop
lookup the drift entry for each fid (e.g., map by function_id) when building
function_reports; adjust the existing variables (function_ids iteration,
existing/state_get logic, compute_metrics call) to use the single drift result
map rather than invoking drift::handle per fid and fall back to None when a fid
has no entry.
In `@eval/src/functions/score.rs`:
- Line 7: The handle function in eval/src/functions/score.rs currently hardcodes
thresholds (0.95, 5000ms, 0.80, 10000ms); change its signature to accept an
&EvalConfig (or a reference to it) and replace those literal values with the
corresponding fields from EvalConfig (e.g., drift_threshold and
timeout/threshold fields), and pass that config into drift::handle (which
already expects &EvalConfig); update any callers of score::handle to supply the
EvalConfig reference so scoring behavior becomes configurable.
In `@eval/src/manifest.rs`:
- Around line 17-23: The manifest hardcodes default_config JSON that duplicates
EvalConfig::default(), causing drift; replace the hardcoded serde_json::json!
block with a conversion from EvalConfig::default() (e.g., serialize or convert
EvalConfig::default() into a serde_json::Value) so default_config is derived
from EvalConfig::default() instead of duplicated; update the code that
constructs default_config (the variable named default_config in this diff) to
call EvalConfig::default() and serialize/convert it into the same JSON/Value
shape used elsewhere.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 82ebb290-77da-43ab-a529-29e62ee9cc33
📒 Files selected for processing (17)
eval/Cargo.tomleval/README.mdeval/SPEC.mdeval/build.rseval/config.yamleval/src/config.rseval/src/functions/analyze.rseval/src/functions/baseline.rseval/src/functions/drift.rseval/src/functions/ingest.rseval/src/functions/metrics.rseval/src/functions/mod.rseval/src/functions/report.rseval/src/functions/score.rseval/src/functions/state.rseval/src/main.rseval/src/manifest.rs
| ``` | ||
| Options: | ||
| --config <PATH> Path to config.yaml [default: ./config.yaml] | ||
| --url <URL> WebSocket URL of the iii engine [default: ws://127.0.0.1:49134] | ||
| --manifest Output module manifest as JSON and exit | ||
| -h, --help Print help | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced options block.
At Line 40, the code fence has no language tag (MD040).
Proposed fix
-```
+```text
Options:
--config <PATH> Path to config.yaml [default: ./config.yaml]
--url <URL> WebSocket URL of the iii engine [default: ws://127.0.0.1:49134]
--manifest Output module manifest as JSON and exit
-h, --help Print help</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 40-40: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @eval/README.md around lines 40 - 46, The fenced code block that begins with
"Options:" is missing a language tag; update the opening fence from ``` to
"Options:" block to ```text) to satisfy MD040; leave the block contents and
closing fence unchanged.
| ``` | ||
| eval:spans:{function_id} — array of span objects (capped at max_spans_per_function) | ||
| eval:baselines:{function_id} — baseline metrics snapshot for drift comparison | ||
| eval:function_index — list of all tracked function IDs | ||
| ``` |
There was a problem hiding this comment.
State scope key mismatch with implementation.
The spec says eval:function_index but the code (ingest.rs:10-11) uses:
SCOPE_INDEX = "eval:index"INDEX_KEY = "function_list"
Update the spec to match the implementation.
Suggested fix
eval:spans:{function_id} — array of span objects (capped at max_spans_per_function)
eval:baselines:{function_id} — baseline metrics snapshot for drift comparison
-eval:function_index — list of all tracked function IDs
+eval:index:function_list — list of all tracked function IDs
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eval/SPEC.md` around lines 25 - 29, Update the spec entry to match the
implementation: replace the line that documents eval:function_index with the
implemented scope and key by using the SCOPE_INDEX and INDEX_KEY names — i.e.,
document the state key as eval:index:function_list and the description as "list
of all tracked function IDs" so the spec aligns with the constants SCOPE_INDEX
and INDEX_KEY used in ingest.rs.
| #[derive(Deserialize, Debug, Clone)] | ||
| pub struct EvalConfig { | ||
| #[serde(default = "default_retention_hours")] | ||
| pub retention_hours: u64, | ||
| #[serde(default = "default_drift_threshold")] | ||
| pub drift_threshold: f64, | ||
| #[serde(default = "default_cron_drift_check")] | ||
| pub cron_drift_check: String, | ||
| #[serde(default = "default_max_spans_per_function")] | ||
| pub max_spans_per_function: usize, | ||
| #[allow(dead_code)] | ||
| #[serde(default = "default_baseline_window_minutes")] | ||
| pub baseline_window_minutes: u64, | ||
| } |
There was a problem hiding this comment.
Reject unknown config keys to avoid silent misconfiguration.
A typo in YAML keys will currently be ignored and defaults will be applied silently. Add deny_unknown_fields on EvalConfig so invalid config fails fast.
Proposed fix
-#[derive(Deserialize, Debug, Clone)]
+#[derive(Deserialize, Debug, Clone)]
+#[serde(deny_unknown_fields)]
pub struct EvalConfig {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[derive(Deserialize, Debug, Clone)] | |
| pub struct EvalConfig { | |
| #[serde(default = "default_retention_hours")] | |
| pub retention_hours: u64, | |
| #[serde(default = "default_drift_threshold")] | |
| pub drift_threshold: f64, | |
| #[serde(default = "default_cron_drift_check")] | |
| pub cron_drift_check: String, | |
| #[serde(default = "default_max_spans_per_function")] | |
| pub max_spans_per_function: usize, | |
| #[allow(dead_code)] | |
| #[serde(default = "default_baseline_window_minutes")] | |
| pub baseline_window_minutes: u64, | |
| } | |
| #[derive(Deserialize, Debug, Clone)] | |
| #[serde(deny_unknown_fields)] | |
| pub struct EvalConfig { | |
| #[serde(default = "default_retention_hours")] | |
| pub retention_hours: u64, | |
| #[serde(default = "default_drift_threshold")] | |
| pub drift_threshold: f64, | |
| #[serde(default = "default_cron_drift_check")] | |
| pub cron_drift_check: String, | |
| #[serde(default = "default_max_spans_per_function")] | |
| pub max_spans_per_function: usize, | |
| #[allow(dead_code)] | |
| #[serde(default = "default_baseline_window_minutes")] | |
| pub baseline_window_minutes: u64, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eval/src/config.rs` around lines 4 - 17, The EvalConfig struct allows
unknown/typoed config keys to be ignored; add serde's deny_unknown_fields
attribute to EvalConfig so unknown YAML keys cause deserialization to fail fast.
Modify the struct annotation for EvalConfig to include
#[serde(deny_unknown_fields)] (in addition to the existing derives) so
deserializing into EvalConfig will reject unexpected fields and surface
configuration mistakes.
| let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id).await.unwrap_or(json!(null)); | ||
| let spans: Vec<Value> = if existing.is_array() { | ||
| serde_json::from_value(existing).unwrap_or_default() | ||
| } else { | ||
| Vec::new() | ||
| }; |
There was a problem hiding this comment.
Do not collapse state read failures into "no span data".
Line 13 currently turns all state_get errors into null, which can hide backend/read failures and produce misleading baseline responses.
Proposed fix
- let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id).await.unwrap_or(json!(null));
+ let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id)
+ .await
+ .map_err(|e| IIIError::Handler(format!("failed to read spans for {function_id}: {e}")))?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id).await.unwrap_or(json!(null)); | |
| let spans: Vec<Value> = if existing.is_array() { | |
| serde_json::from_value(existing).unwrap_or_default() | |
| } else { | |
| Vec::new() | |
| }; | |
| let existing = state_get(iii, crate::functions::ingest::SCOPE_SPANS, function_id) | |
| .await | |
| .map_err(|e| IIIError::Handler(format!("failed to read spans for {function_id}: {e}")))?; | |
| let spans: Vec<Value> = if existing.is_array() { | |
| serde_json::from_value(existing).unwrap_or_default() | |
| } else { | |
| Vec::new() | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eval/src/functions/baseline.rs` around lines 13 - 18, The code currently
collapses all state_get failures into json!(null) which hides backend/read
errors; change the state read to explicitly handle the Result from state_get (do
not use unwrap_or(json!(null))). Match on state_get(...).await: on Err(e)
propagate or return the error (or log and return a failure response) so callers
know the read failed; on Ok(existing) then check if existing.is_array() and
parse the array into spans (using serde_json::from_value on existing) and treat
Null or non-array as empty Vec. Update handling around the variables state_get,
SCOPE_SPANS, function_id, existing, and spans so read errors are not converted
into "no span data".
| if success_rate < 0.95 { | ||
| let penalty = (0.95 - success_rate) * 200.0; | ||
| fn_score -= penalty; | ||
| issues.push(json!({ | ||
| "function_id": fid, | ||
| "issue": "low_success_rate", | ||
| "value": success_rate, | ||
| })); | ||
| } | ||
|
|
||
| if p99 > 5000 { | ||
| let penalty = ((p99 as f64 - 5000.0) / 1000.0).min(30.0); | ||
| fn_score -= penalty; | ||
| issues.push(json!({ | ||
| "function_id": fid, | ||
| "issue": "high_p99_latency", | ||
| "value_ms": p99, | ||
| })); | ||
| } |
There was a problem hiding this comment.
Hardcoded scoring thresholds may conflict with drift detection logic.
The scoring penalties use absolute thresholds (success_rate < 0.95, p99 > 5000ms), while drift detection (per context snippet from drift.rs:53-79) uses a configurable relative threshold (default 15%). This can produce contradictory signals:
- A function with 94% success rate triggers a score penalty but may not trigger drift if baseline was 93%.
- A function degrading from 99% to 86% (13% relative change) won't trigger drift but will trigger scoring issues.
Consider aligning the thresholds or documenting the intentional difference between "absolute health" (score) vs "relative change" (drift).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eval/src/functions/score.rs` around lines 56 - 74, Scoring uses hardcoded
absolute thresholds (success_rate < 0.95, p99 > 5000) which can conflict with
the relative drift logic (15% default); change the scoring in the function that
adjusts fn_score (the block referencing fn_score, success_rate, p99, and
issues.push) to use configurable thresholds (e.g., success_rate_threshold and
p99_threshold) or compute thresholds from the same relative baseline used by
drift detection, then read those values from the same config/params used by
drift.rs (or surface them in the shared config struct), and update issue
payloads/tests/docs to reflect the configurable/derived thresholds so scoring
and drift detection align or explicitly document the intended difference between
absolute health vs relative drift.
| let iii = register_worker( | ||
| &cli.url, | ||
| InitOptions { | ||
| otel: Some(OtelConfig::default()), | ||
| ..Default::default() | ||
| }, | ||
| ); | ||
|
|
||
| let iii_arc = Arc::new(iii); | ||
| functions::register_all(&iii_arc, &config); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the return type and error handling of register_worker in iii_sdk
# to verify if it can fail and how errors are propagated
# Search for register_worker function definition or usage patterns
rg -n "register_worker" --type rust -C 5
# Look for Result types in the SDK imports or worker patterns
ast-grep --pattern 'fn register_worker($$$) -> $RET { $$$ }'Repository: iii-hq/workers
Length of output: 1869
🏁 Script executed:
find . -name "*.rs" -type f | xargs grep -l "pub fn register_worker\|pub async fn register_worker" | head -20Repository: iii-hq/workers
Length of output: 40
🏁 Script executed:
cat Cargo.toml | grep -A 5 "\[dependencies\]" | head -20Repository: iii-hq/workers
Length of output: 100
🏁 Script executed:
# Check if iii_sdk is local or external
ls -la | grep -E "iii_sdk|Cargo"Repository: iii-hq/workers
Length of output: 40
🏁 Script executed:
# Look at other main.rs files to see if there's a pattern for error handling
fd "main\.rs" --type f | xargs rg -l "register_worker" | head -10Repository: iii-hq/workers
Length of output: 99
🏁 Script executed:
find . -name "Cargo.toml" -type f | head -20Repository: iii-hq/workers
Length of output: 101
🏁 Script executed:
# Check for iii_sdk in git or as a path dependency
rg "iii_sdk" --type toml -C 3Repository: iii-hq/workers
Length of output: 40
🏁 Script executed:
# Look for documentation or comments about register_worker behavior
rg "register_worker" --type rust -C 10 | head -100Repository: iii-hq/workers
Length of output: 3243
🏁 Script executed:
cat eval/Cargo.tomlRepository: iii-hq/workers
Length of output: 641
🏁 Script executed:
cat image-resize/Cargo.tomlRepository: iii-hq/workers
Length of output: 709
🏁 Script executed:
# Check if main() function has special error handling or try-catch
rg "fn main|async fn main" --type rust -A 20 eval/src/main.rs | head -40Repository: iii-hq/workers
Length of output: 742
🏁 Script executed:
# Check if there are any panic catches or error recovery patterns
rg "panic|catch_unwind|std::panic" --type rustRepository: iii-hq/workers
Length of output: 150
🏁 Script executed:
# Search for any comments or documentation about register_worker behavior
rg "register_worker" --type rust --type md --type txt -B 5 -A 5Repository: iii-hq/workers
Length of output: 1771
🏁 Script executed:
# Look for test files that might show register_worker behavior
find . -name "*test*" -o -name "*spec*" | grep -E "\.rs$"Repository: iii-hq/workers
Length of output: 40
🏁 Script executed:
# Check how 'iii' is used after register_worker to infer its type/behavior
rg "let iii = register_worker" --type rust -A 20Repository: iii-hq/workers
Length of output: 2408
🏁 Script executed:
# Look for any README or documentation files
find . -name "README*" -o -name "*.md" | head -10Repository: iii-hq/workers
Length of output: 134
🏁 Script executed:
cat README.mdRepository: iii-hq/workers
Length of output: 1591
🏁 Script executed:
cat eval/README.mdRepository: iii-hq/workers
Length of output: 1933
Add error handling for register_worker connection failures.
The register_worker call connects to the III engine via WebSocket synchronously, but the connection error is not exposed for handling. If the engine is unreachable at startup, the behavior is unclear (likely panic or hang). Since main() returns Result<()>, consider wrapping the connection attempt in error handling to enable graceful startup failure with a clear error message when the engine is unavailable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eval/src/main.rs` around lines 62 - 71, register_worker() may fail to connect
and its error is not handled; change the code so you capture the Result from
register_worker (instead of unwrapping or ignoring errors), propagate or map the
error into main()'s Result with a clear message (e.g., using ? or
map_err/anyhow::Context), only create iii_arc and call
functions::register_all(&iii_arc, &config) after a successful register_worker
return, and return an Err from main() with a descriptive error when the
connection to the III engine cannot be established.
| match iii_arc.register_trigger(RegisterTriggerInput { | ||
| trigger_type: "cron".to_string(), | ||
| function_id: "eval::drift".to_string(), | ||
| config: json!({ "expression": cron_expression }), | ||
| }) { | ||
| Ok(_) => tracing::info!("cron trigger registered for eval::drift"), | ||
| Err(e) => tracing::warn!(error = %e, "failed to register cron trigger"), | ||
| } | ||
|
|
||
| match iii_arc.register_trigger(RegisterTriggerInput { | ||
| trigger_type: "subscribe".to_string(), | ||
| function_id: "eval::ingest".to_string(), | ||
| config: json!({ "topic": "telemetry.spans" }), | ||
| }) { | ||
| Ok(_) => tracing::info!("subscribe trigger registered for eval::ingest on telemetry.spans"), | ||
| Err(e) => tracing::warn!(error = %e, "failed to register subscribe trigger"), | ||
| } | ||
|
|
||
| match iii_arc.register_trigger(RegisterTriggerInput { | ||
| trigger_type: "http".to_string(), | ||
| function_id: "eval::analyze_traces".to_string(), | ||
| config: json!({ | ||
| "api_path": "eval/analyze", | ||
| "http_method": "POST" | ||
| }), | ||
| }) { | ||
| Ok(_) => tracing::info!("http trigger registered for eval::analyze_traces"), | ||
| Err(e) => tracing::warn!(error = %e, "failed to register http trigger for analyze_traces"), | ||
| } |
There was a problem hiding this comment.
Trigger registration failures silently degrade the worker.
All three trigger registrations log warnings on failure but allow the worker to continue. This means the worker could report "ready" while being non-functional (e.g., no span ingestion if subscribe trigger fails, no drift detection if cron fails).
Consider either:
- Treating trigger registration failures as fatal errors that prevent startup
- Exposing registration status via a health check endpoint
- At minimum, logging at
errorlevel rather thanwarnsince these are critical capabilities
🛠️ Proposed fix: Fail fast on critical trigger registration failures
- match iii_arc.register_trigger(RegisterTriggerInput {
+ iii_arc.register_trigger(RegisterTriggerInput {
trigger_type: "cron".to_string(),
function_id: "eval::drift".to_string(),
config: json!({ "expression": cron_expression }),
- }) {
- Ok(_) => tracing::info!("cron trigger registered for eval::drift"),
- Err(e) => tracing::warn!(error = %e, "failed to register cron trigger"),
- }
+ }).map_err(|e| {
+ tracing::error!(error = %e, "failed to register cron trigger");
+ e
+ })?;
+ tracing::info!("cron trigger registered for eval::drift");
- match iii_arc.register_trigger(RegisterTriggerInput {
+ iii_arc.register_trigger(RegisterTriggerInput {
trigger_type: "subscribe".to_string(),
function_id: "eval::ingest".to_string(),
config: json!({ "topic": "telemetry.spans" }),
- }) {
- Ok(_) => tracing::info!("subscribe trigger registered for eval::ingest on telemetry.spans"),
- Err(e) => tracing::warn!(error = %e, "failed to register subscribe trigger"),
- }
+ }).map_err(|e| {
+ tracing::error!(error = %e, "failed to register subscribe trigger");
+ e
+ })?;
+ tracing::info!("subscribe trigger registered for eval::ingest on telemetry.spans");
- match iii_arc.register_trigger(RegisterTriggerInput {
+ iii_arc.register_trigger(RegisterTriggerInput {
trigger_type: "http".to_string(),
function_id: "eval::analyze_traces".to_string(),
config: json!({
"api_path": "eval/analyze",
"http_method": "POST"
}),
- }) {
- Ok(_) => tracing::info!("http trigger registered for eval::analyze_traces"),
- Err(e) => tracing::warn!(error = %e, "failed to register http trigger for analyze_traces"),
- }
+ }).map_err(|e| {
+ tracing::error!(error = %e, "failed to register http trigger for analyze_traces");
+ e
+ })?;
+ tracing::info!("http trigger registered for eval::analyze_traces");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eval/src/main.rs` around lines 75 - 103, The trigger registration code calls
iii_arc.register_trigger for three critical triggers (cron -> function_id
"eval::drift", subscribe -> "eval::ingest", and http -> "eval::analyze_traces")
but only logs warnings on failure; change this to fail fast by treating
registration errors as fatal: propagate the Err out of main (or return an
Err/exit process) instead of continuing, e.g., replace the Err branches that do
tracing::warn!(...) with returning the error (or calling
anyhow::bail!/process::exit) and log at error level with context including the
register_trigger error and which RegisterTriggerInput failed (reference the
register_trigger calls and the RegisterTriggerInput structs/cront_expression) so
startup aborts when any of these critical triggers cannot be registered.
Summary
Functions
eval::ingesttelemetry.spanseval::metricseval::scoreeval::drifteval::baselineeval::reporteval::analyze_tracesTest plan
cargo test— 19 tests passingcargo check— 0 warningsSummary by CodeRabbit
Release Notes
New Features
Documentation