From 3ae33a8eee709fbf2d8070b40c1498ba4d11bcde Mon Sep 17 00:00:00 2001 From: Love Kumar Chauhan Date: Thu, 7 May 2026 22:10:06 +0530 Subject: [PATCH 1/2] docs: research GenAI instrumentation landscape (#154) Includes audit of Python, JS/TS, Java, and .NET ecosystems and semantic convention coverage. --- projects/154-genai-ecosystem/00-research.md | 181 ++++++++++++++++++++ projects/154-genai-ecosystem/NEXT-STEPS.md | 56 ++++++ projects/154-genai-ecosystem/_index.md | 49 ++++++ projects/_index.md | 9 + 4 files changed, 295 insertions(+) create mode 100644 projects/154-genai-ecosystem/00-research.md create mode 100644 projects/154-genai-ecosystem/NEXT-STEPS.md create mode 100644 projects/154-genai-ecosystem/_index.md diff --git a/projects/154-genai-ecosystem/00-research.md b/projects/154-genai-ecosystem/00-research.md new file mode 100644 index 00000000..5d64837d --- /dev/null +++ b/projects/154-genai-ecosystem/00-research.md @@ -0,0 +1,181 @@ +--- +title: "Research — GenAI instrumentation landscape" +issue: 154 +type: audit +phase: 1 +status: in-progress +last_updated: "2026-05-07" +--- + +## Research — GenAI instrumentation landscape + +--- + +## Python + +Python has the most instrumentation coverage of any language right now. The bulk of it sits in +[opentelemetry-python-contrib](https://github.com/open-telemetry/opentelemetry-python-contrib), with +active third-party alternatives from Traceloop and Arize. + +| Framework / SDK | Library | Type | Signals | Semconv notes | +| -------------------- | ------------------------------------------ | ----------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| OpenAI Python SDK | `opentelemetry-instrumentation-openai` | Contrib | Traces, Metrics | Covers chat completions and embeddings. Emits `gen_ai.usage.*` token metrics. Missing tool-call span attributes. | +| Anthropic Python SDK | `opentelemetry-instrumentation-anthropic` | Contrib | Traces, Metrics | Similar coverage to OpenAI instrumentation. Streaming support present but events are batched, not streamed per token. | +| LangChain | `opentelemetry-instrumentation-langchain` | Contrib | Traces | Chain and LLM call spans. Tool execution spans exist but `gen_ai.tool.*` attributes aren't stable yet in semconv so naming varies. | +| LlamaIndex | `opentelemetry-instrumentation-llamaindex` | Contrib | Traces | Query and retrieval spans. RAG pipeline tracing is functional but `gen_ai.retrieval.*` conventions are still experimental. | +| LiteLLM | Built-in LiteLLM callback | Native | Traces, Metrics | Covers 100+ providers through a single integration point. Attribute completeness varies by provider underneath. | +| Any Python LLM | openllmetry (Traceloop) | Third-party | Traces, Metrics, Logs | Broader framework coverage than contrib alone. Uses OTel semconv where stable, custom attributes elsewhere. | +| Any Python LLM | openinference (Arize) | Third-party | Traces | `openinference-semantic-conventions` is a parallel convention set; not compatible with OTel GenAI semconv directly. | + +### Python observations + +The contrib instrumentations for OpenAI and Anthropic are the most semconv-aligned. LangChain and +LlamaIndex tracing exists but the agentic/RAG parts of the convention are still in flux, so those +spans use a mix of stable and experimental attributes. + +LiteLLM is the most practical unified entry point for multi-provider coverage, but because it +proxies everything through one interface, provider-specific response attributes can be lossy. + +The Arize openinference conventions are a separate fork of the semantics; frameworks that adopted +openinference early (several LlamaIndex integrations, for example) don't map cleanly to OTel semconv +without a translation layer. + +--- + +## JavaScript / TypeScript + +Coverage here is thinner. The main OTel JS SDK has no GenAI-specific instrumentations in +opentelemetry-js-contrib yet. Most production usage goes through third-party libraries. + +| Framework / SDK | Library | Type | Signals | Semconv notes | +| ---------------- | -------------------------------------------------- | ----------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| LangChain.js | `@traceloop/node-server-sdk` (openllmetry-js) | Third-party | Traces, Metrics | Functional tracing for chains and LLM calls. Attribute coverage is reasonable for stable semconv; tool and agent spans are custom. | +| LangChain.js | `@arizeai/openinference-instrumentation-langchain` | Third-party | Traces | openinference semantics, not OTel GenAI semconv. | +| OpenAI JS SDK | No dedicated instrumentation found | — | — | Manual instrumentation via `@opentelemetry/api` is the current path. | +| Vercel AI SDK | No dedicated instrumentation found | — | — | Some users wrap with manual spans; no contrib library. | +| Anthropic JS SDK | No dedicated instrumentation found | — | — | Same situation as OpenAI JS. | + +### JavaScript / TypeScript observations + +The JS/TS gap is notable. There's no equivalent of the Python contrib instrumentations for the major +SDKs. openllmetry-js covers LangChain.js reasonably well, but direct OpenAI/Anthropic JS +instrumentation requires manual work. + +This is probably the clearest gap for the ecosystem explorer to surface — users reaching for the +OpenAI JS SDK expecting plug-and-play instrumentation won't find a contrib solution today. + +--- + +## Java + +Java instrumentation for GenAI is early-stage compared to Python. The main +[opentelemetry-java-instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation) +repo doesn't have dedicated GenAI instrumentations yet as of early 2026. + +| Framework / SDK | Library | Type | Signals | Semconv notes | +| --------------- | ----------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------- | +| AWS Bedrock | `opentelemetry-java-contrib` | Contrib | Traces | Partial. Inference spans exist; attribute coverage of GenAI semconv is incomplete. | +| LangChain4j | No OTel instrumentation found | — | — | Framework has its own observability hooks but no OTel bridge. | +| Spring AI | No OTel instrumentation found | — | — | Spring AI has Micrometer integration; an OTel bridge exists for Micrometer but GenAI-specific spans aren't emitted. | + +### Java observations + +Java is the weakest area. Spring AI + Micrometer is the closest thing to structured telemetry but +it's not on the OTel GenAI semconv path. LangChain4j is widely used in the Java ecosystem but has no +instrumentation at all yet. + +--- + +## .NET + +.NET coverage is sparse. Microsoft Semantic Kernel is the dominant framework here. + +| Framework / SDK | Library | Type | Signals | Semconv notes | +| --------------- | ----------------------------------------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------- | +| Semantic Kernel | Built-in activity source (`Microsoft.SemanticKernel`) | Native | Traces | Uses `System.Diagnostics.Activity`, compatible with OTel. Attribute naming predates the GenAI semconv; doesn't follow `gen_ai.*` conventions. | +| OpenAI .NET SDK | No dedicated OTel instrumentation found | — | — | Similar to the JS situation. | + +### .NET observations + +Semantic Kernel is interesting — it does emit spans, and those spans flow into OTel collectors fine, +but the attribute names don't match the GenAI semconv. So you get traces but they don't interoperate +with dashboards built on `gen_ai.*` attributes. + +--- + +## GenAI semantic conventions coverage + +The [OTel GenAI semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/) has a stable core and +an experimental section. Here's roughly how coverage maps: + +### Stable attributes (generally well-adopted in contrib instrumentations) + +- `gen_ai.system` — present in all Python contrib instrumentations +- `gen_ai.operation.name` — present (`chat`, `text_completion`, `embeddings`) +- `gen_ai.request.model` — present +- `gen_ai.response.model` — present where the API returns it +- `gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` — present in Python contrib, missing in + most JS and Java + +### Experimental attributes (patchy adoption) + +- `gen_ai.request.temperature`, `gen_ai.request.top_p`, `gen_ai.request.max_tokens` — present in + OpenAI and Anthropic Python contrib, absent or inconsistent elsewhere +- `gen_ai.response.finish_reasons` — present in Python contrib +- `gen_ai.response.id` — present in OpenAI Python contrib, absent in others + +### Not yet in most instrumentations + +- `gen_ai.tool.*` — tool call spans exist in LangChain contrib but attribute names are custom +- Agent and RAG spans — conventions are still being drafted; implementations vary widely + +--- + +## Patterns observed + +**Provider-level vs. framework-level instrumentation.** Python contrib instruments at the SDK level +(OpenAI, Anthropic). LangChain and LlamaIndex instrumentations sit above that and emit +chain/workflow spans separately. This means a LangChain app using the OpenAI SDK can end up with +both sets of spans, which is usually useful but adds cardinality. + +**Streaming support is inconsistently handled.** Most instrumentations batch streaming responses and +emit a single span when the stream closes. This is correct per the current semconv guidance but +means you can't observe time-to-first-token from spans alone. + +**Third-party convention fragmentation.** openllmetry and openinference both fill real gaps but +their attribute naming diverges from each other and from OTel semconv in places. Someone building a +dashboard that queries `gen_ai.*` attributes won't get data from openinference-instrumented apps +without a mapping layer. + +**Java and .NET are 1–2 years behind Python.** The frameworks exist and are in production use, but +the instrumentation layer hasn't caught up. Spring AI and LangChain4j in particular are large enough +that this gap is worth flagging. + +--- + +## Research conclusions + +1. **Python is the reference implementation.** Any future registry schema for GenAI should be + modeled after the Python contrib instrumentations, as they currently have the most complete + mapping to OTel semantic conventions. +2. **The JS/TS gap is critical.** While Traceloop covers LangChain.js, there is no direct, + low-dependency OTel-native way to instrument the OpenAI JS SDK. This is a major opportunity for + the OpenTelemetry ecosystem. +3. **Java and .NET require bridge-level research.** Since these ecosystems rely on Spring AI, + LangChain4j, and Semantic Kernel, the focus should be on how these frameworks' native telemetry + can be mapped or exported to OTel, rather than building separate instrumentations. +4. **Semantic convention convergence is ongoing.** The "experimental" attributes (like temperature, + top_p, and tool calls) are inconsistent. The + [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) project should be used + as the primary benchmark for verifying future instrumentation support. + +--- + +## Related resources + +- [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) +- [opentelemetry-python-contrib GenAI instrumentations](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation) +- [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) — automated + attribute-level coverage tests. +- [openllmetry](https://github.com/traceloop/openllmetry) (Traceloop) +- [openinference](https://github.com/Arize-ai/openinference) (Arize) diff --git a/projects/154-genai-ecosystem/NEXT-STEPS.md b/projects/154-genai-ecosystem/NEXT-STEPS.md new file mode 100644 index 00000000..1a88cef8 --- /dev/null +++ b/projects/154-genai-ecosystem/NEXT-STEPS.md @@ -0,0 +1,56 @@ +--- +title: "Roadmap — GenAI ecosystem research" +issue: 154 +type: roadmap +phase: meta +status: in-progress +last_updated: "2026-05-07" +--- + +## Next steps + +--- + +## Where we are + +- Initial research sweep complete: frameworks surveyed across Python, JS/TS, Java, and .NET. +- Findings documented in [`00-research.md`](./00-research.md), covering instrumentation type (native + / contrib / third-party), signals captured, and semconv adoption level per framework. +- The [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) project identified + as a useful integration point — it runs automated attribute-level coverage tests for 40+ + libraries. + +--- + +## Immediate next steps + +In order: + +- [ ] Post findings summary as a comment on + [#154](https://github.com/open-telemetry/opentelemetry-ecosystem-explorer/issues/154) to get + maintainer feedback before any registry work begins. +- [ ] Confirm with maintainers how GenAI instrumentation should be represented in the explorer: as a + new ecosystem entry, as a category within existing ecosystems, or as a separate data type. +- [ ] Review the genai-otel-conformance dashboard for per-attribute coverage data that could feed + into the registry. +- [ ] Identify which frameworks are structured enough to support automated data extraction vs. which + need manual curation. + +--- + +## Open questions + +| # | Question | Status | +| --- | -------------------------------------------------------------------------------------------------------------------------------- | ------ | +| 1 | Should GenAI instrumentation be a new top-level ecosystem in the explorer, or surfaced as a signal/category on existing entries? | Open | +| 2 | Is the genai-otel-conformance dashboard an integration target (i.e., pull its data into the registry) or just a reference? | Open | +| 3 | How should third-party (non-OTel-org) instrumentation libraries be classified vs. contrib/native ones? | Open | +| 4 | Which languages are in scope for a first registry cut? (Python + JS/TS seem clearest; Java and .NET have less coverage.) | Open | + +--- + +## Decision log + +| Date | Decision | Notes | +| ---------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------- | +| 2026-05-07 | Scoped initial research to Python, JS/TS, Java, and .NET per the issue description. | Aligns with the four languages called out in the issue scope. | diff --git a/projects/154-genai-ecosystem/_index.md b/projects/154-genai-ecosystem/_index.md new file mode 100644 index 00000000..c9a98222 --- /dev/null +++ b/projects/154-genai-ecosystem/_index.md @@ -0,0 +1,49 @@ +--- +title: "Issue #154 — Research GenAI Ecosystem" +issue: 154 +type: index +phase: meta +status: in-progress +last_updated: "2026-05-07" +--- + +## Issue #154 — Research GenAI Ecosystem + +--- + +## What this folder is + +A research workspace for mapping the GenAI/LLM instrumentation landscape so that users of the +ecosystem explorer can understand what's available, what telemetry each framework emits, and how +complete the semantic convention adoption is. + +OTel is actively developing +[GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/), but the +frameworks are moving fast. This folder documents where things stand: which frameworks have OTel +instrumentation, what signals they emit, and where the gaps are. + +--- + +## Where to start + +1. **[`_index.md`](./_index.md)** _(you are here)_ — folder landing page. +2. **[`NEXT-STEPS.md`](./NEXT-STEPS.md)** — rolling roadmap, open questions, and what comes next. +3. **[`00-research.md`](./00-research.md)** — the research findings: frameworks by language, signals + captured, semantic convention coverage, and patterns observed. + +--- + +## Files in this folder + +| File | Purpose | +| ------------------------------------ | ------------------------------------------------------------------- | +| [`_index.md`](./_index.md) | This file. Stable folder landing page. | +| [`NEXT-STEPS.md`](./NEXT-STEPS.md) | Rolling roadmap — open questions, decisions, immediate next steps. | +| [`00-research.md`](./00-research.md) | Research findings: frameworks, signals, semconv coverage, patterns. | + +--- + +## Workspace status + +This initiative is currently in **Phase 1 (Research)**. Progress is tracked in +[`NEXT-STEPS.md`](./NEXT-STEPS.md). diff --git a/projects/_index.md b/projects/_index.md index 1e994ddf..159b38b1 100644 --- a/projects/_index.md +++ b/projects/_index.md @@ -24,6 +24,15 @@ projects/ --- +## Current initiatives + +| Folder | Issue | Description | Status | +| ------------------------------------------------ | ------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ----------- | +| [`84-ui-ux-design/`](./84-ui-ux-design/) | [#84](https://github.com/open-telemetry/opentelemetry-ecosystem-explorer/issues/84) | Explorer UI/UX redesign — visual alignment with opentelemetry.io, phased across five page areas. | planning | +| [`154-genai-ecosystem/`](./154-genai-ecosystem/) | [#154](https://github.com/open-telemetry/opentelemetry-ecosystem-explorer/issues/154) | Research GenAI ecosystem — survey of GenAI/LLM instrumentation libraries and semantic convention coverage across languages. | in-progress | + +--- + ## Folder convention Each significant initiative gets its own subfolder, named after the GitHub issue that tracks it: From ee7b072902c007c3a9c27bab83b91642cdb29b28 Mon Sep 17 00:00:00 2001 From: Love Kumar Chauhan Date: Fri, 8 May 2026 23:50:10 +0530 Subject: [PATCH 2/2] docs: update Java GenAI instrumentation details per mentor feedback --- projects/154-genai-ecosystem/00-research.md | 35 +++++++++++---------- projects/154-genai-ecosystem/NEXT-STEPS.md | 4 +-- projects/154-genai-ecosystem/_index.md | 2 +- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/projects/154-genai-ecosystem/00-research.md b/projects/154-genai-ecosystem/00-research.md index 5d64837d..f873ee9b 100644 --- a/projects/154-genai-ecosystem/00-research.md +++ b/projects/154-genai-ecosystem/00-research.md @@ -4,7 +4,7 @@ issue: 154 type: audit phase: 1 status: in-progress -last_updated: "2026-05-07" +last_updated: "2026-05-08" --- ## Research — GenAI instrumentation landscape @@ -68,21 +68,23 @@ OpenAI JS SDK expecting plug-and-play instrumentation won't find a contrib solut ## Java -Java instrumentation for GenAI is early-stage compared to Python. The main +Java instrumentation for GenAI is rapidly evolving. The main [opentelemetry-java-instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation) -repo doesn't have dedicated GenAI instrumentations yet as of early 2026. +agent now includes dedicated support for major providers. -| Framework / SDK | Library | Type | Signals | Semconv notes | -| --------------- | ----------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------- | -| AWS Bedrock | `opentelemetry-java-contrib` | Contrib | Traces | Partial. Inference spans exist; attribute coverage of GenAI semconv is incomplete. | -| LangChain4j | No OTel instrumentation found | — | — | Framework has its own observability hooks but no OTel bridge. | -| Spring AI | No OTel instrumentation found | — | — | Spring AI has Micrometer integration; an OTel bridge exists for Micrometer but GenAI-specific spans aren't emitted. | +| Framework / SDK | Library | Type | Signals | Semconv notes | +| --------------- | ------------------------------------ | ----- | --------------- | ---------------------------------------------------------------------------------------- | +| OpenAI Java SDK | `opentelemetry-java-instrumentation` | Agent | Traces, Metrics | Standardized GenAI client spans and metrics (token usage, model) following OTel semconv. | +| AWS Bedrock | `opentelemetry-java-instrumentation` | Agent | Traces | Supported via AWS SDK instrumentation. Captures inference spans. | +| LangChain4j | No OTel instrumentation found | — | — | Framework has its own observability hooks but no direct OTel bridge yet. | +| Spring AI | No OTel instrumentation found | — | — | Spring AI has Micrometer integration; an OTel bridge exists for Micrometer. | ### Java observations -Java is the weakest area. Spring AI + Micrometer is the closest thing to structured telemetry but -it's not on the OTel GenAI semconv path. LangChain4j is widely used in the Java ecosystem but has no -instrumentation at all yet. +Java coverage is stronger than initially reported, with the official agent supporting OpenAI and +Bedrock (via the AWS SDK). While high-level frameworks like Spring AI and LangChain4j lack direct +OTel-native instrumentations, the underlying SDK support provides a solid foundation for capturing +standardized GenAI telemetry. --- @@ -147,9 +149,9 @@ their attribute naming diverges from each other and from OTel semconv in places. dashboard that queries `gen_ai.*` attributes won't get data from openinference-instrumented apps without a mapping layer. -**Java and .NET are 1–2 years behind Python.** The frameworks exist and are in production use, but -the instrumentation layer hasn't caught up. Spring AI and LangChain4j in particular are large enough -that this gap is worth flagging. +**Java and .NET are catching up.** The frameworks exist and are in production use. Java's official +agent now covers major SDKs, but higher-level framework integration (Spring AI, LangChain4j) is +still emerging. .NET remains focused on Semantic Kernel. --- @@ -161,9 +163,10 @@ that this gap is worth flagging. 2. **The JS/TS gap is critical.** While Traceloop covers LangChain.js, there is no direct, low-dependency OTel-native way to instrument the OpenAI JS SDK. This is a major opportunity for the OpenTelemetry ecosystem. -3. **Java and .NET require bridge-level research.** Since these ecosystems rely on Spring AI, +3. **Java and .NET require framework-level research.** Since these ecosystems rely on Spring AI, LangChain4j, and Semantic Kernel, the focus should be on how these frameworks' native telemetry - can be mapped or exported to OTel, rather than building separate instrumentations. + can be mapped or exported to OTel, supplementing the SDK-level instrumentation already present in + Java. 4. **Semantic convention convergence is ongoing.** The "experimental" attributes (like temperature, top_p, and tool calls) are inconsistent. The [genai-otel-conformance](https://github.com/trask/genai-otel-conformance) project should be used diff --git a/projects/154-genai-ecosystem/NEXT-STEPS.md b/projects/154-genai-ecosystem/NEXT-STEPS.md index 1a88cef8..c331ac07 100644 --- a/projects/154-genai-ecosystem/NEXT-STEPS.md +++ b/projects/154-genai-ecosystem/NEXT-STEPS.md @@ -4,7 +4,7 @@ issue: 154 type: roadmap phase: meta status: in-progress -last_updated: "2026-05-07" +last_updated: "2026-05-08" --- ## Next steps @@ -45,7 +45,7 @@ In order: | 1 | Should GenAI instrumentation be a new top-level ecosystem in the explorer, or surfaced as a signal/category on existing entries? | Open | | 2 | Is the genai-otel-conformance dashboard an integration target (i.e., pull its data into the registry) or just a reference? | Open | | 3 | How should third-party (non-OTel-org) instrumentation libraries be classified vs. contrib/native ones? | Open | -| 4 | Which languages are in scope for a first registry cut? (Python + JS/TS seem clearest; Java and .NET have less coverage.) | Open | +| 4 | Which languages are in scope for a first registry cut? (Python + JS/TS and Java now seem viable; .NET remains behind.) | Open | --- diff --git a/projects/154-genai-ecosystem/_index.md b/projects/154-genai-ecosystem/_index.md index c9a98222..16256fb5 100644 --- a/projects/154-genai-ecosystem/_index.md +++ b/projects/154-genai-ecosystem/_index.md @@ -4,7 +4,7 @@ issue: 154 type: index phase: meta status: in-progress -last_updated: "2026-05-07" +last_updated: "2026-05-08" --- ## Issue #154 — Research GenAI Ecosystem