[codex] add Langfuse tracing and chat debug correlation by penso · Pull Request #535 · moltis-org/moltis

penso · 2026-04-01T10:34:02Z

Summary

add opt-in Langfuse tracing via OTLP/HTTP at CLI startup, with config, sampling, auth headers, and shutdown flushing
instrument chat and agent execution with Langfuse root, generation, and tool observations, including sanitized payload capture and trace IDs in websocket payloads
persist and surface trace IDs in session history, aborted partials, and the run detail UI so failed and aborted runs stay debuggable after reload

Validation

Completed

Remaining

just lint
Blocked by an unrelated existing Clippy failure in crates/voice/src/stt/elevenlabs.rs (clippy::manual_inspect).
cargo check -p moltis-chat -p moltis-web
Blocked by unrelated existing errors in crates/httpd/src/ssh_routes.rs (state.gateway.vault no longer exists).

Manual QA

enable [metrics.langfuse] in config with valid Langfuse credentials
run a normal chat turn, a tool-using turn, and an aborted turn
verify the Langfuse project shows root run traces, child generation spans, and child tool spans
verify the chat UI footer and run detail panel show the trace ID, and aborted partials keep the same trace correlation after reload

codspeed-hq · 2026-04-01T10:39:22Z

Merging this PR will improve performance by 79.59%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 38 untouched benchmarks
⏩ 5 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`env_substitution`	18.2 µs	10.1 µs	+79.59%

_{Comparing pickle-receipt (147d22a) with main (3272781)}

5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2026-04-01T10:41:37Z

Codecov Report

❌ Patch coverage is 65.20376% with 555 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/chat/src/lib.rs	56.81%	317 Missing ⚠️
crates/agents/src/runner.rs	82.64%	76 Missing ⚠️
crates/cli/src/main.rs	0.00%	57 Missing ⚠️
crates/cli/src/telemetry.rs	48.64%	57 Missing ⚠️
crates/common/src/observability.rs	81.81%	30 Missing ⚠️
crates/config/src/schema.rs	63.04%	17 Missing ⚠️
crates/sessions/src/message.rs	96.66%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-04-01T20:34:57Z

Greptile Summary

This PR adds opt-in Langfuse tracing via OpenTelemetry OTLP/HTTP, hooking into the existing tracing subscriber infrastructure at CLI startup. When metrics.langfuse.enabled = true in the config, a SdkTracerProvider is installed that exports spans to Langfuse using Basic auth; all other code paths are unaffected at runtime.

New moltis_common::observability module provides sanitization primitives (redact_json_value, redact_text, is_sensitive_key) with a careful two-tier key-matching strategy that avoids false positives on token_count/output_tokens (exact match for short keys like "token") while still catching access_token/refresh_token (substring match). Previous review concerns about "token" matching "token_count" and about redact_line stopping at the first separator are both resolved in the current implementation.
crates/cli/src/telemetry.rs constructs the OTLP exporter with configurable sampling, Base64 Basic auth, and a TelemetryHandles::shutdown flush on process exit.
crates/agents/src/runner.rs instruments each LLM request and tool call with child spans tagged langfuse.observation.type = generation/tool. Telemetry settings propagate via __telemetry_* keys in the tool_context JSON, which are stripped before tool arguments are forwarded.
crates/chat/src/lib.rs creates the root chat.run span per user turn, extracts a trace ID only when the span is both valid and sampled (is_sampled() guard — the exact fix requested in the prior review thread), stores it in ActiveAssistantDraft, and threads it through ChatFinalBroadcast, ChatErrorBroadcast, and AssistantTurnOutput so it reaches persisted session history.
crates/sessions/src/message.rs adds trace_id: Option<String> to PersistedMessage::Assistant with skip_serializing_if = "Option::is_none" — a backwards-compatible JSONL change.
Web UI surfaces trace IDs in message footers (sessions.js), the run-detail panel (run-detail.js), and error blocks (websocket.js).

Two P2 style issues remain: opentelemetry/tracing-opentelemetry are unconditional deps in agents and chat instead of being feature-gated per CLAUDE.md conventions, and LangfuseContentSettings/ObservabilitySettings are structurally duplicated across the two crates.

Confidence Score: 5/5

Safe to merge — all prior P0/P1 review concerns are resolved and remaining findings are P2 style suggestions

The three specific concerns from the previous review thread are fully addressed: (1) SENSITIVE_KEYS now uses exact matching for "token" via EXACT_SENSITIVE_KEYS, preventing false positives on token_count/output_tokens; (2) trace_id_for_span correctly guards on span_context.is_sampled(), so unsampled traces never produce stored IDs; (3) redact_inline_assignments iterates over all char_indices rather than stopping at the first separator. The two remaining P2 items (missing feature gate for OTel deps, duplicate settings struct) are maintenance-quality issues that don't affect correctness or security.

crates/agents/Cargo.toml and crates/chat/Cargo.toml — opentelemetry/tracing-opentelemetry should be optional deps with a feature gate; crates/chat/src/lib.rs — LangfuseContentSettings duplicates ObservabilitySettings in runner crate

Important Files Changed

Filename	Overview
crates/cli/src/telemetry.rs	New file: Langfuse OTLP exporter setup with Basic auth headers, configurable `ParentBased(TraceIdRatioBased)` sampling, and `TelemetryHandles::shutdown` flush — clean and well-tested
crates/common/src/observability.rs	New file: sanitization helpers with two-tier key matching (exact vs substring) resolving prior review concerns about `token_count` false positives; `redact_inline_assignments` scans all separators; well-tested
crates/agents/src/runner.rs	Instruments LLM and tool calls with child Langfuse spans via `__telemetry_*` tool context propagation; OTel added as non-optional dep violating CLAUDE.md feature-gate requirement
crates/chat/src/lib.rs	Root `chat.run` span creation with correct `is_sampled()` guard; `trace_id` propagated through draft/final/error paths; `LangfuseContentSettings` duplicates `ObservabilitySettings` in runner crate
crates/config/src/schema.rs	Adds `LangfuseConfig` with full `build_schema_map` coverage; manual `Debug` impl redacts credentials; `TraceContentMode` defaults to `Sanitized`
crates/sessions/src/message.rs	Backwards-compatible `trace_id: Option<String>` addition to `PersistedMessage::Assistant` with `skip_serializing_if` — no migration needed for existing JSONL files
crates/web/src/assets/js/websocket.js	Renders trace ID in message footer and error blocks; persists `trace_id` to session store on `final` and `abort` events
crates/web/src/assets/js/components/run-detail.js	Aggregates trace IDs from all assistant messages in a `Set` (deduplicating multi-iteration runs) and renders inline per message

Sequence Diagram

sequenceDiagram
    participant User as User / WS Client
    participant Chat as ChatService (chat/lib.rs)
    participant Runner as AgentRunner (runner.rs)
    participant LLM as LLM Provider
    participant Langfuse as Langfuse / OTLP

    User->>Chat: send message
    Chat->>Chat: create chat.run span (run_span)
    Chat->>Chat: extract trace_id (if is_valid && is_sampled)
    Chat->>Chat: store trace_id in ActiveAssistantDraft
    Chat->>Runner: run_agent_loop_streaming().instrument(run_span)
    loop Each LLM iteration
        Runner->>Runner: create llm.request span (child of run_span)
        Runner->>Runner: set input / metadata attributes
        Runner->>LLM: stream_with_tools()
        LLM-->>Runner: stream events
        Runner->>Runner: set output / usage attributes on llm_span
        opt Tool calls
            Runner->>Runner: create tool.call span (child of run_span)
            Runner->>Runner: execute tool
            Runner->>Runner: set tool output / error attributes
        end
    end
    Runner-->>Chat: AssistantTurnOutput (+ trace_id)
    Chat->>Chat: persist PersistedMessage::Assistant { trace_id }
    Chat->>User: ChatFinalBroadcast { traceId }
    Chat->>Langfuse: OTLP batch export (async, on shutdown)

_{Reviews (3): Last reviewed commit: "fix(config): preserve langfuse trace def..." | Re-trigger Greptile}

crates/common/src/observability.rs

greptile-apps · 2026-04-01T20:35:02Z

crates/cli/src/telemetry.rs

+    let sample_rate = langfuse.sample_rate.clamp(0.0, 1.0);
+    let provider = SdkTracerProvider::builder()
+        .with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(
+            sample_rate,
+        ))))
+        .with_resource(Resource::builder_empty().with_attributes(resource).build())
+        .with_batch_exporter(exporter)
+        .build();


Sampled-out traces still produce stored trace IDs

Sampler::TraceIdRatioBased assigns a valid trace_id to every root span regardless of the sampling decision — only the export is suppressed. Consequently, when sample_rate < 1.0, trace_id_for_span returns a real hex trace ID even for runs that were never sent to Langfuse. Those IDs are then persisted in session history and rendered in the UI, where clicking the trace ID leads nowhere.

One option is to guard trace_id_for_span so it returns None when the span's SpanContext.is_sampled() flag is false:

fn trace_id_for_span(span: &Span) -> Option<String> { let context = span.context(); let trace_span = context.span(); let span_context = trace_span.span_context(); (span_context.is_valid() && span_context.is_sampled()) .then(|| span_context.trace_id().to_string()) }

crates/common/src/observability.rs

feat(observability): add langfuse tracing

147d22a

penso marked this pull request as ready for review April 1, 2026 20:25

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

github-actions bot mentioned this pull request Apr 2, 2026

🦞 OpenClaw 生态日报 2026-04-02 gsscsd/big_model_radar#121

Open

penso added 3 commits April 6, 2026 11:44

fix(observability): tighten trace redaction

ecbcd3e

fix(config): preserve langfuse trace defaults

7743bf4

refactor(observability): gate tracing internals

ea1c9b6

github-actions bot mentioned this pull request Apr 7, 2026

🦞 OpenClaw 生态日报 2026-04-07 gsscsd/big_model_radar#148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add Langfuse tracing and chat debug correlation#535

[codex] add Langfuse tracing and chat debug correlation#535
penso wants to merge 4 commits intomainfrom
pickle-receipt

penso commented Apr 1, 2026

Uh oh!

codspeed-hq bot commented Apr 1, 2026

Uh oh!

codecov bot commented Apr 1, 2026

Uh oh!

greptile-apps bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

penso commented Apr 1, 2026

Summary

Validation

Completed

Remaining

Manual QA

Uh oh!

codspeed-hq bot commented Apr 1, 2026

Merging this PR will improve performance by 79.59%

Performance Changes

Footnotes

Uh oh!

codecov bot commented Apr 1, 2026

Codecov Report

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Apr 1, 2026 •

edited

Loading