-
Notifications
You must be signed in to change notification settings - Fork 34
docs: research GenAI instrumentation landscape (#154) #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jaydeluca
merged 8 commits into
open-telemetry:main
from
LoveChauhan-18:research/154-genai-ecosystem
May 11, 2026
Merged
Changes from 3 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
3ae33a8
docs: research GenAI instrumentation landscape (#154)
LoveChauhan-18 2b72e26
Merge branch 'main' into research/154-genai-ecosystem
LoveChauhan-18 be20b36
Merge branch 'main' into research/154-genai-ecosystem
LoveChauhan-18 ee7b072
docs: update Java GenAI instrumentation details per mentor feedback
LoveChauhan-18 4a83722
Merge branch 'main' into research/154-genai-ecosystem
LoveChauhan-18 3896559
Merge branch 'main' into research/154-genai-ecosystem
LoveChauhan-18 82de8b7
Merge branch 'main' into research/154-genai-ecosystem
LoveChauhan-18 4e0e2bc
Merge branch 'main' into research/154-genai-ecosystem
LoveChauhan-18 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| --- | ||
| title: "Research — GenAI instrumentation landscape" | ||
| issue: 154 | ||
| type: audit | ||
| phase: 1 | ||
| status: in-progress | ||
| last_updated: "2026-05-07" | ||
| --- | ||
|
|
||
| ## Research — GenAI instrumentation landscape | ||
|
|
||
| --- | ||
|
|
||
| ## Python | ||
|
|
||
| Python has the most instrumentation coverage of any language right now. The bulk of it sits in | ||
| [opentelemetry-python-contrib](https://github.com/open-telemetry/opentelemetry-python-contrib), with | ||
| active third-party alternatives from Traceloop and Arize. | ||
|
|
||
| | Framework / SDK | Library | Type | Signals | Semconv notes | | ||
| | -------------------- | ------------------------------------------ | ----------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ||
| | OpenAI Python SDK | `opentelemetry-instrumentation-openai` | Contrib | Traces, Metrics | Covers chat completions and embeddings. Emits `gen_ai.usage.*` token metrics. Missing tool-call span attributes. | | ||
| | Anthropic Python SDK | `opentelemetry-instrumentation-anthropic` | Contrib | Traces, Metrics | Similar coverage to OpenAI instrumentation. Streaming support present but events are batched, not streamed per token. | | ||
| | LangChain | `opentelemetry-instrumentation-langchain` | Contrib | Traces | Chain and LLM call spans. Tool execution spans exist but `gen_ai.tool.*` attributes aren't stable yet in semconv so naming varies. | | ||
| | LlamaIndex | `opentelemetry-instrumentation-llamaindex` | Contrib | Traces | Query and retrieval spans. RAG pipeline tracing is functional but `gen_ai.retrieval.*` conventions are still experimental. | | ||
| | LiteLLM | Built-in LiteLLM callback | Native | Traces, Metrics | Covers 100+ providers through a single integration point. Attribute completeness varies by provider underneath. | | ||
| | Any Python LLM | openllmetry (Traceloop) | Third-party | Traces, Metrics, Logs | Broader framework coverage than contrib alone. Uses OTel semconv where stable, custom attributes elsewhere. | | ||
| | Any Python LLM | openinference (Arize) | Third-party | Traces | `openinference-semantic-conventions` is a parallel convention set; not compatible with OTel GenAI semconv directly. | | ||
|
|
||
| ### Python observations | ||
|
|
||
| The contrib instrumentations for OpenAI and Anthropic are the most semconv-aligned. LangChain and | ||
| LlamaIndex tracing exists but the agentic/RAG parts of the convention are still in flux, so those | ||
| spans use a mix of stable and experimental attributes. | ||
|
|
||
| LiteLLM is the most practical unified entry point for multi-provider coverage, but because it | ||
| proxies everything through one interface, provider-specific response attributes can be lossy. | ||
|
|
||
| The Arize openinference conventions are a separate fork of the semantics; frameworks that adopted | ||
| openinference early (several LlamaIndex integrations, for example) don't map cleanly to OTel semconv | ||
| without a translation layer. | ||
|
|
||
| --- | ||
|
|
||
| ## JavaScript / TypeScript | ||
|
|
||
| Coverage here is thinner. The main OTel JS SDK has no GenAI-specific instrumentations in | ||
| opentelemetry-js-contrib yet. Most production usage goes through third-party libraries. | ||
|
|
||
| | Framework / SDK | Library | Type | Signals | Semconv notes | | ||
| | ---------------- | -------------------------------------------------- | ----------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ||
| | LangChain.js | `@traceloop/node-server-sdk` (openllmetry-js) | Third-party | Traces, Metrics | Functional tracing for chains and LLM calls. Attribute coverage is reasonable for stable semconv; tool and agent spans are custom. | | ||
| | LangChain.js | `@arizeai/openinference-instrumentation-langchain` | Third-party | Traces | openinference semantics, not OTel GenAI semconv. | | ||
| | OpenAI JS SDK | No dedicated instrumentation found | — | — | Manual instrumentation via `@opentelemetry/api` is the current path. | | ||
| | Vercel AI SDK | No dedicated instrumentation found | — | — | Some users wrap with manual spans; no contrib library. | | ||
| | Anthropic JS SDK | No dedicated instrumentation found | — | — | Same situation as OpenAI JS. | | ||
|
|
||
| ### JavaScript / TypeScript observations | ||
|
|
||
| The JS/TS gap is notable. There's no equivalent of the Python contrib instrumentations for the major | ||
| SDKs. openllmetry-js covers LangChain.js reasonably well, but direct OpenAI/Anthropic JS | ||
| instrumentation requires manual work. | ||
|
|
||
| This is probably the clearest gap for the ecosystem explorer to surface — users reaching for the | ||
| OpenAI JS SDK expecting plug-and-play instrumentation won't find a contrib solution today. | ||
|
|
||
| --- | ||
|
|
||
| ## Java | ||
|
|
||
| Java instrumentation for GenAI is early-stage compared to Python. The main | ||
| [opentelemetry-java-instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation) | ||
| repo doesn't have dedicated GenAI instrumentations yet as of early 2026. | ||
|
|
||
| | Framework / SDK | Library | Type | Signals | Semconv notes | | ||
| | --------------- | ----------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | ||
| | AWS Bedrock | `opentelemetry-java-contrib` | Contrib | Traces | Partial. Inference spans exist; attribute coverage of GenAI semconv is incomplete. | | ||
| | LangChain4j | No OTel instrumentation found | — | — | Framework has its own observability hooks but no OTel bridge. | | ||
| | Spring AI | No OTel instrumentation found | — | — | Spring AI has Micrometer integration; an OTel bridge exists for Micrometer but GenAI-specific spans aren't emitted. | | ||
|
|
||
| ### Java observations | ||
|
|
||
| Java is the weakest area. Spring AI + Micrometer is the closest thing to structured telemetry but | ||
| it's not on the OTel GenAI semconv path. LangChain4j is widely used in the Java ecosystem but has no | ||
| instrumentation at all yet. | ||
|
|
||
| --- | ||
|
|
||
| ## .NET | ||
|
|
||
| .NET coverage is sparse. Microsoft Semantic Kernel is the dominant framework here. | ||
|
|
||
| | Framework / SDK | Library | Type | Signals | Semconv notes | | ||
| | --------------- | ----------------------------------------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | Semantic Kernel | Built-in activity source (`Microsoft.SemanticKernel`) | Native | Traces | Uses `System.Diagnostics.Activity`, compatible with OTel. Attribute naming predates the GenAI semconv; doesn't follow `gen_ai.*` conventions. | | ||
| | OpenAI .NET SDK | No dedicated OTel instrumentation found | — | — | Similar to the JS situation. | | ||
|
|
||
| ### .NET observations | ||
|
|
||
| Semantic Kernel is interesting — it does emit spans, and those spans flow into OTel collectors fine, | ||
| but the attribute names don't match the GenAI semconv. So you get traces but they don't interoperate | ||
| with dashboards built on `gen_ai.*` attributes. | ||
|
|
||
| --- | ||
|
|
||
| ## GenAI semantic conventions coverage | ||
|
|
||
| The [OTel GenAI semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/) has a stable core and | ||
| an experimental section. Here's roughly how coverage maps: | ||
|
|
||
| ### Stable attributes (generally well-adopted in contrib instrumentations) | ||
|
|
||
| - `gen_ai.system` — present in all Python contrib instrumentations | ||
| - `gen_ai.operation.name` — present (`chat`, `text_completion`, `embeddings`) | ||
| - `gen_ai.request.model` — present | ||
| - `gen_ai.response.model` — present where the API returns it | ||
| - `gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` — present in Python contrib, missing in | ||
| most JS and Java | ||
|
|
||
| ### Experimental attributes (patchy adoption) | ||
|
|
||
| - `gen_ai.request.temperature`, `gen_ai.request.top_p`, `gen_ai.request.max_tokens` — present in | ||
| OpenAI and Anthropic Python contrib, absent or inconsistent elsewhere | ||
| - `gen_ai.response.finish_reasons` — present in Python contrib | ||
| - `gen_ai.response.id` — present in OpenAI Python contrib, absent in others | ||
|
|
||
| ### Not yet in most instrumentations | ||
|
|
||
| - `gen_ai.tool.*` — tool call spans exist in LangChain contrib but attribute names are custom | ||
| - Agent and RAG spans — conventions are still being drafted; implementations vary widely | ||
|
|
||
| --- | ||
|
|
||
| ## Patterns observed | ||
|
|
||
| **Provider-level vs. framework-level instrumentation.** Python contrib instruments at the SDK level | ||
| (OpenAI, Anthropic). LangChain and LlamaIndex instrumentations sit above that and emit | ||
| chain/workflow spans separately. This means a LangChain app using the OpenAI SDK can end up with | ||
| both sets of spans, which is usually useful but adds cardinality. | ||
|
|
||
| **Streaming support is inconsistently handled.** Most instrumentations batch streaming responses and | ||
| emit a single span when the stream closes. This is correct per the current semconv guidance but | ||
| means you can't observe time-to-first-token from spans alone. | ||
|
|
||
| **Third-party convention fragmentation.** openllmetry and openinference both fill real gaps but | ||
| their attribute naming diverges from each other and from OTel semconv in places. Someone building a | ||
| dashboard that queries `gen_ai.*` attributes won't get data from openinference-instrumented apps | ||
| without a mapping layer. | ||
|
|
||
| **Java and .NET are 1–2 years behind Python.** The frameworks exist and are in production use, but | ||
| the instrumentation layer hasn't caught up. Spring AI and LangChain4j in particular are large enough | ||
| that this gap is worth flagging. | ||
|
|
||
| --- | ||
|
|
||
| ## Research conclusions | ||
|
|
||
| 1. **Python is the reference implementation.** Any future registry schema for GenAI should be | ||
| modeled after the Python contrib instrumentations, as they currently have the most complete | ||
| mapping to OTel semantic conventions. | ||
| 2. **The JS/TS gap is critical.** While Traceloop covers LangChain.js, there is no direct, | ||
| low-dependency OTel-native way to instrument the OpenAI JS SDK. This is a major opportunity for | ||
| the OpenTelemetry ecosystem. | ||
| 3. **Java and .NET require bridge-level research.** Since these ecosystems rely on Spring AI, | ||
| LangChain4j, and Semantic Kernel, the focus should be on how these frameworks' native telemetry | ||
| can be mapped or exported to OTel, rather than building separate instrumentations. | ||
| 4. **Semantic convention convergence is ongoing.** The "experimental" attributes (like temperature, | ||
| top_p, and tool calls) are inconsistent. The | ||
| [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) project should be used | ||
| as the primary benchmark for verifying future instrumentation support. | ||
|
|
||
| --- | ||
|
|
||
| ## Related resources | ||
|
|
||
| - [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) | ||
| - [opentelemetry-python-contrib GenAI instrumentations](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation) | ||
| - [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) — automated | ||
| attribute-level coverage tests. | ||
| - [openllmetry](https://github.com/traceloop/openllmetry) (Traceloop) | ||
| - [openinference](https://github.com/Arize-ai/openinference) (Arize) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| --- | ||
| title: "Roadmap — GenAI ecosystem research" | ||
| issue: 154 | ||
| type: roadmap | ||
| phase: meta | ||
| status: in-progress | ||
| last_updated: "2026-05-07" | ||
| --- | ||
|
|
||
| ## Next steps | ||
|
|
||
| --- | ||
|
|
||
| ## Where we are | ||
|
|
||
| - Initial research sweep complete: frameworks surveyed across Python, JS/TS, Java, and .NET. | ||
| - Findings documented in [`00-research.md`](./00-research.md), covering instrumentation type (native | ||
| / contrib / third-party), signals captured, and semconv adoption level per framework. | ||
| - The [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) project identified | ||
| as a useful integration point — it runs automated attribute-level coverage tests for 40+ | ||
| libraries. | ||
|
|
||
| --- | ||
|
|
||
| ## Immediate next steps | ||
|
|
||
| In order: | ||
|
|
||
| - [ ] Post findings summary as a comment on | ||
| [#154](https://github.com/open-telemetry/opentelemetry-ecosystem-explorer/issues/154) to get | ||
| maintainer feedback before any registry work begins. | ||
| - [ ] Confirm with maintainers how GenAI instrumentation should be represented in the explorer: as a | ||
| new ecosystem entry, as a category within existing ecosystems, or as a separate data type. | ||
| - [ ] Review the genai-otel-conformance dashboard for per-attribute coverage data that could feed | ||
| into the registry. | ||
| - [ ] Identify which frameworks are structured enough to support automated data extraction vs. which | ||
| need manual curation. | ||
|
|
||
| --- | ||
|
|
||
| ## Open questions | ||
|
|
||
| | # | Question | Status | | ||
| | --- | -------------------------------------------------------------------------------------------------------------------------------- | ------ | | ||
| | 1 | Should GenAI instrumentation be a new top-level ecosystem in the explorer, or surfaced as a signal/category on existing entries? | Open | | ||
| | 2 | Is the genai-otel-conformance dashboard an integration target (i.e., pull its data into the registry) or just a reference? | Open | | ||
| | 3 | How should third-party (non-OTel-org) instrumentation libraries be classified vs. contrib/native ones? | Open | | ||
| | 4 | Which languages are in scope for a first registry cut? (Python + JS/TS seem clearest; Java and .NET have less coverage.) | Open | | ||
|
|
||
| --- | ||
|
|
||
| ## Decision log | ||
|
|
||
| | Date | Decision | Notes | | ||
| | ---------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------- | | ||
| | 2026-05-07 | Scoped initial research to Python, JS/TS, Java, and .NET per the issue description. | Aligns with the four languages called out in the issue scope. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| --- | ||
| title: "Issue #154 — Research GenAI Ecosystem" | ||
| issue: 154 | ||
| type: index | ||
| phase: meta | ||
| status: in-progress | ||
| last_updated: "2026-05-07" | ||
| --- | ||
|
|
||
| ## Issue #154 — Research GenAI Ecosystem | ||
|
|
||
| --- | ||
|
|
||
| ## What this folder is | ||
|
|
||
| A research workspace for mapping the GenAI/LLM instrumentation landscape so that users of the | ||
| ecosystem explorer can understand what's available, what telemetry each framework emits, and how | ||
| complete the semantic convention adoption is. | ||
|
|
||
| OTel is actively developing | ||
| [GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/), but the | ||
| frameworks are moving fast. This folder documents where things stand: which frameworks have OTel | ||
| instrumentation, what signals they emit, and where the gaps are. | ||
|
|
||
| --- | ||
|
|
||
| ## Where to start | ||
|
|
||
| 1. **[`_index.md`](./_index.md)** _(you are here)_ — folder landing page. | ||
| 2. **[`NEXT-STEPS.md`](./NEXT-STEPS.md)** — rolling roadmap, open questions, and what comes next. | ||
| 3. **[`00-research.md`](./00-research.md)** — the research findings: frameworks by language, signals | ||
| captured, semantic convention coverage, and patterns observed. | ||
|
|
||
| --- | ||
|
|
||
| ## Files in this folder | ||
|
|
||
| | File | Purpose | | ||
| | ------------------------------------ | ------------------------------------------------------------------- | | ||
| | [`_index.md`](./_index.md) | This file. Stable folder landing page. | | ||
| | [`NEXT-STEPS.md`](./NEXT-STEPS.md) | Rolling roadmap — open questions, decisions, immediate next steps. | | ||
| | [`00-research.md`](./00-research.md) | Research findings: frameworks, signals, semconv coverage, patterns. | | ||
|
|
||
| --- | ||
|
|
||
| ## Workspace status | ||
|
|
||
| This initiative is currently in **Phase 1 (Research)**. Progress is tracked in | ||
| [`NEXT-STEPS.md`](./NEXT-STEPS.md). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.