[codex] add Langfuse tracing and chat debug correlation#535
[codex] add Langfuse tracing and chat debug correlation#535
Conversation
Merging this PR will improve performance by 79.59%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | env_substitution |
18.2 µs | 10.1 µs | +79.59% |
Comparing pickle-receipt (147d22a) with main (3272781)
Footnotes
-
5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds opt-in Langfuse tracing via OpenTelemetry OTLP/HTTP, hooking into the existing
Two P2 style issues remain: Confidence Score: 5/5Safe to merge — all prior P0/P1 review concerns are resolved and remaining findings are P2 style suggestions The three specific concerns from the previous review thread are fully addressed: (1) crates/agents/Cargo.toml and crates/chat/Cargo.toml — opentelemetry/tracing-opentelemetry should be optional deps with a feature gate; crates/chat/src/lib.rs — LangfuseContentSettings duplicates ObservabilitySettings in runner crate Important Files Changed
Sequence DiagramsequenceDiagram
participant User as User / WS Client
participant Chat as ChatService (chat/lib.rs)
participant Runner as AgentRunner (runner.rs)
participant LLM as LLM Provider
participant Langfuse as Langfuse / OTLP
User->>Chat: send message
Chat->>Chat: create chat.run span (run_span)
Chat->>Chat: extract trace_id (if is_valid && is_sampled)
Chat->>Chat: store trace_id in ActiveAssistantDraft
Chat->>Runner: run_agent_loop_streaming().instrument(run_span)
loop Each LLM iteration
Runner->>Runner: create llm.request span (child of run_span)
Runner->>Runner: set input / metadata attributes
Runner->>LLM: stream_with_tools()
LLM-->>Runner: stream events
Runner->>Runner: set output / usage attributes on llm_span
opt Tool calls
Runner->>Runner: create tool.call span (child of run_span)
Runner->>Runner: execute tool
Runner->>Runner: set tool output / error attributes
end
end
Runner-->>Chat: AssistantTurnOutput (+ trace_id)
Chat->>Chat: persist PersistedMessage::Assistant { trace_id }
Chat->>User: ChatFinalBroadcast { traceId }
Chat->>Langfuse: OTLP batch export (async, on shutdown)
Reviews (3): Last reviewed commit: "fix(config): preserve langfuse trace def..." | Re-trigger Greptile |
| let sample_rate = langfuse.sample_rate.clamp(0.0, 1.0); | ||
| let provider = SdkTracerProvider::builder() | ||
| .with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased( | ||
| sample_rate, | ||
| )))) | ||
| .with_resource(Resource::builder_empty().with_attributes(resource).build()) | ||
| .with_batch_exporter(exporter) | ||
| .build(); |
There was a problem hiding this comment.
Sampled-out traces still produce stored trace IDs
Sampler::TraceIdRatioBased assigns a valid trace_id to every root span regardless of the sampling decision — only the export is suppressed. Consequently, when sample_rate < 1.0, trace_id_for_span returns a real hex trace ID even for runs that were never sent to Langfuse. Those IDs are then persisted in session history and rendered in the UI, where clicking the trace ID leads nowhere.
One option is to guard trace_id_for_span so it returns None when the span's SpanContext.is_sampled() flag is false:
fn trace_id_for_span(span: &Span) -> Option<String> {
let context = span.context();
let trace_span = context.span();
let span_context = trace_span.span_context();
(span_context.is_valid() && span_context.is_sampled())
.then(|| span_context.trace_id().to_string())
}
Summary
Validation
Completed
cargo +nightly-2025-11-30 fmt --allcargo check -p moltis-chat -p moltis-sessions -p moltis-agents -p moltiscargo check -p moltis-chat -p moltis-sessionscargo test -p moltis-common --libcargo test -p moltis-sessions --libcargo test -p moltis-chat chat_final_broadcast_serializes_trace_id_in_camel_casecargo test -p moltis-chat active_assistant_draft_persists_trace_idcargo test -p moltis-chat abort_waits_for_pending_tool_history_before_persisting_partialcargo test -p moltis-agents runner::tests::observability_settings_default_to_off_without_tool_context -- --exactcargo test -p moltis-agents runner::tests::merge_tool_context_skips_internal_telemetry_keys -- --exactcargo test -p moltis --bin moltis telemetry::tests::disabled_langfuse_returns_none -- --exactcargo test -p moltis --bin moltis telemetry::tests::enabled_langfuse_requires_credentials -- --exactbiome check --write crates/web/src/assets/js/chat-ui.js crates/web/src/assets/js/websocket.js crates/web/src/assets/js/sessions.jsbiome check --write crates/web/src/assets/js/websocket.js crates/web/src/assets/js/components/run-detail.jsRemaining
just lintBlocked by an unrelated existing Clippy failure in
crates/voice/src/stt/elevenlabs.rs(clippy::manual_inspect).cargo check -p moltis-chat -p moltis-webBlocked by unrelated existing errors in
crates/httpd/src/ssh_routes.rs(state.gateway.vaultno longer exists).Manual QA
[metrics.langfuse]in config with valid Langfuse credentials