Skip to content

[codex] add Langfuse tracing and chat debug correlation#535

Open
penso wants to merge 4 commits intomainfrom
pickle-receipt
Open

[codex] add Langfuse tracing and chat debug correlation#535
penso wants to merge 4 commits intomainfrom
pickle-receipt

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Apr 1, 2026

Summary

  • add opt-in Langfuse tracing via OTLP/HTTP at CLI startup, with config, sampling, auth headers, and shutdown flushing
  • instrument chat and agent execution with Langfuse root, generation, and tool observations, including sanitized payload capture and trace IDs in websocket payloads
  • persist and surface trace IDs in session history, aborted partials, and the run detail UI so failed and aborted runs stay debuggable after reload

Validation

Completed

  • cargo +nightly-2025-11-30 fmt --all
  • cargo check -p moltis-chat -p moltis-sessions -p moltis-agents -p moltis
  • cargo check -p moltis-chat -p moltis-sessions
  • cargo test -p moltis-common --lib
  • cargo test -p moltis-sessions --lib
  • cargo test -p moltis-chat chat_final_broadcast_serializes_trace_id_in_camel_case
  • cargo test -p moltis-chat active_assistant_draft_persists_trace_id
  • cargo test -p moltis-chat abort_waits_for_pending_tool_history_before_persisting_partial
  • cargo test -p moltis-agents runner::tests::observability_settings_default_to_off_without_tool_context -- --exact
  • cargo test -p moltis-agents runner::tests::merge_tool_context_skips_internal_telemetry_keys -- --exact
  • cargo test -p moltis --bin moltis telemetry::tests::disabled_langfuse_returns_none -- --exact
  • cargo test -p moltis --bin moltis telemetry::tests::enabled_langfuse_requires_credentials -- --exact
  • biome check --write crates/web/src/assets/js/chat-ui.js crates/web/src/assets/js/websocket.js crates/web/src/assets/js/sessions.js
  • biome check --write crates/web/src/assets/js/websocket.js crates/web/src/assets/js/components/run-detail.js

Remaining

  • just lint
    Blocked by an unrelated existing Clippy failure in crates/voice/src/stt/elevenlabs.rs (clippy::manual_inspect).
  • cargo check -p moltis-chat -p moltis-web
    Blocked by unrelated existing errors in crates/httpd/src/ssh_routes.rs (state.gateway.vault no longer exists).

Manual QA

  • enable [metrics.langfuse] in config with valid Langfuse credentials
  • run a normal chat turn, a tool-using turn, and an aborted turn
  • verify the Langfuse project shows root run traces, child generation spans, and child tool spans
  • verify the chat UI footer and run detail panel show the trace ID, and aborted partials keep the same trace correlation after reload

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 1, 2026

Merging this PR will improve performance by 79.59%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 38 untouched benchmarks
⏩ 5 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
env_substitution 18.2 µs 10.1 µs +79.59%

Comparing pickle-receipt (147d22a) with main (3272781)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

@penso penso marked this pull request as ready for review April 1, 2026 20:25
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR adds opt-in Langfuse tracing via OpenTelemetry OTLP/HTTP, hooking into the existing tracing subscriber infrastructure at CLI startup. When metrics.langfuse.enabled = true in the config, a SdkTracerProvider is installed that exports spans to Langfuse using Basic auth; all other code paths are unaffected at runtime.

  • New moltis_common::observability module provides sanitization primitives (redact_json_value, redact_text, is_sensitive_key) with a careful two-tier key-matching strategy that avoids false positives on token_count/output_tokens (exact match for short keys like "token") while still catching access_token/refresh_token (substring match). Previous review concerns about "token" matching "token_count" and about redact_line stopping at the first separator are both resolved in the current implementation.
  • crates/cli/src/telemetry.rs constructs the OTLP exporter with configurable sampling, Base64 Basic auth, and a TelemetryHandles::shutdown flush on process exit.
  • crates/agents/src/runner.rs instruments each LLM request and tool call with child spans tagged langfuse.observation.type = generation/tool. Telemetry settings propagate via __telemetry_* keys in the tool_context JSON, which are stripped before tool arguments are forwarded.
  • crates/chat/src/lib.rs creates the root chat.run span per user turn, extracts a trace ID only when the span is both valid and sampled (is_sampled() guard — the exact fix requested in the prior review thread), stores it in ActiveAssistantDraft, and threads it through ChatFinalBroadcast, ChatErrorBroadcast, and AssistantTurnOutput so it reaches persisted session history.
  • crates/sessions/src/message.rs adds trace_id: Option<String> to PersistedMessage::Assistant with skip_serializing_if = "Option::is_none" — a backwards-compatible JSONL change.
  • Web UI surfaces trace IDs in message footers (sessions.js), the run-detail panel (run-detail.js), and error blocks (websocket.js).

Two P2 style issues remain: opentelemetry/tracing-opentelemetry are unconditional deps in agents and chat instead of being feature-gated per CLAUDE.md conventions, and LangfuseContentSettings/ObservabilitySettings are structurally duplicated across the two crates.

Confidence Score: 5/5

Safe to merge — all prior P0/P1 review concerns are resolved and remaining findings are P2 style suggestions

The three specific concerns from the previous review thread are fully addressed: (1) SENSITIVE_KEYS now uses exact matching for "token" via EXACT_SENSITIVE_KEYS, preventing false positives on token_count/output_tokens; (2) trace_id_for_span correctly guards on span_context.is_sampled(), so unsampled traces never produce stored IDs; (3) redact_inline_assignments iterates over all char_indices rather than stopping at the first separator. The two remaining P2 items (missing feature gate for OTel deps, duplicate settings struct) are maintenance-quality issues that don't affect correctness or security.

crates/agents/Cargo.toml and crates/chat/Cargo.toml — opentelemetry/tracing-opentelemetry should be optional deps with a feature gate; crates/chat/src/lib.rs — LangfuseContentSettings duplicates ObservabilitySettings in runner crate

Important Files Changed

Filename Overview
crates/cli/src/telemetry.rs New file: Langfuse OTLP exporter setup with Basic auth headers, configurable ParentBased(TraceIdRatioBased) sampling, and TelemetryHandles::shutdown flush — clean and well-tested
crates/common/src/observability.rs New file: sanitization helpers with two-tier key matching (exact vs substring) resolving prior review concerns about token_count false positives; redact_inline_assignments scans all separators; well-tested
crates/agents/src/runner.rs Instruments LLM and tool calls with child Langfuse spans via __telemetry_* tool context propagation; OTel added as non-optional dep violating CLAUDE.md feature-gate requirement
crates/chat/src/lib.rs Root chat.run span creation with correct is_sampled() guard; trace_id propagated through draft/final/error paths; LangfuseContentSettings duplicates ObservabilitySettings in runner crate
crates/config/src/schema.rs Adds LangfuseConfig with full build_schema_map coverage; manual Debug impl redacts credentials; TraceContentMode defaults to Sanitized
crates/sessions/src/message.rs Backwards-compatible trace_id: Option<String> addition to PersistedMessage::Assistant with skip_serializing_if — no migration needed for existing JSONL files
crates/web/src/assets/js/websocket.js Renders trace ID in message footer and error blocks; persists trace_id to session store on final and abort events
crates/web/src/assets/js/components/run-detail.js Aggregates trace IDs from all assistant messages in a Set (deduplicating multi-iteration runs) and renders inline per message

Sequence Diagram

sequenceDiagram
    participant User as User / WS Client
    participant Chat as ChatService (chat/lib.rs)
    participant Runner as AgentRunner (runner.rs)
    participant LLM as LLM Provider
    participant Langfuse as Langfuse / OTLP

    User->>Chat: send message
    Chat->>Chat: create chat.run span (run_span)
    Chat->>Chat: extract trace_id (if is_valid && is_sampled)
    Chat->>Chat: store trace_id in ActiveAssistantDraft
    Chat->>Runner: run_agent_loop_streaming().instrument(run_span)
    loop Each LLM iteration
        Runner->>Runner: create llm.request span (child of run_span)
        Runner->>Runner: set input / metadata attributes
        Runner->>LLM: stream_with_tools()
        LLM-->>Runner: stream events
        Runner->>Runner: set output / usage attributes on llm_span
        opt Tool calls
            Runner->>Runner: create tool.call span (child of run_span)
            Runner->>Runner: execute tool
            Runner->>Runner: set tool output / error attributes
        end
    end
    Runner-->>Chat: AssistantTurnOutput (+ trace_id)
    Chat->>Chat: persist PersistedMessage::Assistant { trace_id }
    Chat->>User: ChatFinalBroadcast { traceId }
    Chat->>Langfuse: OTLP batch export (async, on shutdown)
Loading

Reviews (3): Last reviewed commit: "fix(config): preserve langfuse trace def..." | Re-trigger Greptile

Comment on lines +99 to +106
let sample_rate = langfuse.sample_rate.clamp(0.0, 1.0);
let provider = SdkTracerProvider::builder()
.with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(
sample_rate,
))))
.with_resource(Resource::builder_empty().with_attributes(resource).build())
.with_batch_exporter(exporter)
.build();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Sampled-out traces still produce stored trace IDs

Sampler::TraceIdRatioBased assigns a valid trace_id to every root span regardless of the sampling decision — only the export is suppressed. Consequently, when sample_rate < 1.0, trace_id_for_span returns a real hex trace ID even for runs that were never sent to Langfuse. Those IDs are then persisted in session history and rendered in the UI, where clicking the trace ID leads nowhere.

One option is to guard trace_id_for_span so it returns None when the span's SpanContext.is_sampled() flag is false:

fn trace_id_for_span(span: &Span) -> Option<String> {
    let context = span.context();
    let trace_span = context.span();
    let span_context = trace_span.span_context();
    (span_context.is_valid() && span_context.is_sampled())
        .then(|| span_context.trace_id().to_string())
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant