-
Notifications
You must be signed in to change notification settings - Fork 38
docs: add research findings for issue 119 V1 vs V2 registry comparison #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
2aec4b3
284950a
659ff88
a1c15a4
2a87779
db6c19b
3cfdb5b
ddf63f2
d541024
ce662df
c056680
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,199 @@ | ||
| --- | ||
| title: "Research — V1 vs V2 Registry: Collector Component Coverage and Metadata" | ||
| issue: 119 | ||
| type: audit | ||
| phase: 1 | ||
| status: in-progress | ||
| last_updated: "2026-05-11" | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| This document captures the research findings for | ||
| [issue #119](https://github.com/open-telemetry/opentelemetry-ecosystem-explorer/issues/119). It | ||
| compares the V1 registry (opentelemetry.io/data/registry) and the V2 registry | ||
| (ecosystem-explorer/ecosystem-registry) focusing on collector components, their coverage, their | ||
| metadata, and the automation that maintains each. | ||
|
|
||
| --- | ||
|
|
||
| ## Registry Definitions | ||
|
|
||
| | Name | Location | Purpose | | ||
| | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| | **V1 registry** ("otel io registry") | [opentelemetry.io/data/registry](https://github.com/open-telemetry/opentelemetry.io/tree/main/data/registry) | Flat YAML files, one per component, used to power the registry section of opentelemetry.io and to inject component versions into docs via Hugo shortcodes. | | ||
| | **V2 registry** ("explorer registry") | [ecosystem-explorer/ecosystem-registry](https://github.com/open-telemetry/opentelemetry-ecosystem-explorer/tree/main/ecosystem-registry) | Version-stamped YAML files per release, richer metadata sourced from component metadata.yaml files in the collector repos, used to power the Ecosystem Explorer website. | | ||
|
|
||
| --- | ||
|
|
||
| ## Component Coverage | ||
|
|
||
| ### V2 Registry (latest: v0.151.0) | ||
|
|
||
| Components tracked per distribution and type: | ||
|
|
||
| #### Contrib distribution | ||
|
|
||
| | Type | Count | | ||
| | --------- | ------- | | ||
| | receiver | 113 | | ||
| | exporter | 48 | | ||
| | processor | 31 | | ||
| | extension | 43 | | ||
| | connector | 14 | | ||
| | **Total** | **249** | | ||
|
|
||
| **Core distribution** is also tracked separately and includes the standard set of components bundled | ||
| with the collector core (otlp receiver/exporter, batch processor, etc.). | ||
|
|
||
| ### V1 Registry (as of May 2026) | ||
|
|
||
| Based on the registry contents at opentelemetry.io/data/registry filtered to `language: collector`: | ||
|
|
||
| | Type | Count | | ||
| | --------- | ------- | | ||
| | receiver | 120 | | ||
| | exporter | 60 | | ||
| | processor | 32 | | ||
| | extension | 48 | | ||
| | connector | 14 | | ||
| | **Total** | **274** | | ||
|
|
||
| ### Coverage Gap | ||
|
|
||
| V1 has more entries than V2 in most categories. The key reasons are: | ||
|
|
||
| 1. V1 includes components from third-party distributions (not just core and contrib). Some exporters | ||
| and extensions in V1 point to external repositories outside the opentelemetry-collector-contrib | ||
| repo. | ||
| 2. V2 currently only tracks core and contrib distributions. | ||
| 3. V1 was built up manually over time; some entries may represent components that have since been | ||
| removed or renamed in contrib, creating stale entries. | ||
|
|
||
| --- | ||
|
|
||
| ## Metadata Fields Comparison | ||
|
|
||
| ### V1 Registry Fields | ||
|
|
||
| Each V1 entry is a single YAML file with these fields: | ||
|
|
||
| | Field | Required | Description | | ||
| | ------------------ | -------- | ------------------------------------------------------------------- | | ||
| | `title` | yes | Human-readable display name | | ||
| | `registryType` | yes | Component type: receiver, exporter, processor, connector, extension | | ||
| | `language` | yes | Set to `collector` for all collector components | | ||
| | `tags` | yes | Array: typically includes `go`, the component type, and `collector` | | ||
| | `license` | yes | License (e.g., Apache 2.0) | | ||
| | `description` | yes | Short description of what the component does | | ||
| | `authors` | yes | Array of objects with `name` field | | ||
| | `urls.repo` | yes | URL to the component source repository | | ||
| | `createdAt` | yes | ISO date when the entry was created | | ||
| | `package.registry` | yes | Package registry type (e.g., `go-collector`) | | ||
| | `package.name` | yes | Full Go module path | | ||
| | `package.version` | yes | Current version, updated by automation | | ||
|
|
||
| ### V2 Registry Fields | ||
|
|
||
| V2 entries are stored inside versioned YAML files (one file per component type per version). Each | ||
| component entry includes: | ||
|
|
||
| | Field | Required | Description | | ||
| | --------------------------------------- | -------- | -------------------------------------------------------------------------------- | | ||
| | `name` | yes | Component identifier (e.g., `activedirectorydsreceiver`) | | ||
| | `metadata.type` | yes | Short type name (e.g., `active_directory_ds`) | | ||
| | `metadata.display_name` | no | Human-readable name | | ||
| | `metadata.description` | no | Description | | ||
| | `metadata.status.class` | yes | receiver, processor, exporter, connector, extension | | ||
| | `metadata.status.stability` | yes | Per-signal stability: development, alpha, beta, stable, deprecated, unmaintained | | ||
| | `metadata.status.distributions` | no | Which distributions include this component | | ||
| | `metadata.status.codeowners.active` | no | GitHub handles of active maintainers | | ||
| | `metadata.status.codeowners.emeritus` | no | GitHub handles of emeritus maintainers | | ||
| | `metadata.status.unsupported_platforms` | no | Platforms where component does not work | | ||
| | `metadata.attributes` | no | Attribute definitions emitted by this component | | ||
| | `metadata.metrics` | no | Metric definitions emitted by this component | | ||
| | `metadata.config` | no | JSON Schema definition of the component configuration | | ||
| | `metadata.resource_attributes` | no | Resource attribute definitions | | ||
|
|
||
| ### Field Gap Analysis | ||
|
|
||
| Fields in V1 that V2 does NOT track: | ||
|
|
||
| | V1 Field | Notes | | ||
| | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `license` | V2 does not track license per component. Could be sourced from GitHub API (license field on the repo) or inferred since all contrib components are Apache 2.0. | | ||
| | `authors` | V2 has `codeowners` which is a closer modern equivalent but is not the same as authors. | | ||
| | `tags` | V2 has no tag system. V1 uses tags like `go`, `collector`, `receiver` for filtering on opentelemetry.io. | | ||
| | `createdAt` | V2 does not track when a component was first registered. | | ||
| | `urls.repo` | V2 does not store an explicit repository URL per component, though the source location is derivable from the Go module path. | | ||
|
|
||
| Fields in V2 that V1 does NOT track: | ||
|
|
||
| | V2 Field | Notes | | ||
| | -------------------------- | --------------------------------------------------------------------------------------- | | ||
| | `stability` (per signal) | V1 has no stability level. Users of V1 cannot distinguish alpha from stable components. | | ||
| | `distributions` | V1 does not record which distributions bundle each component. | | ||
| | `codeowners` | V1 does not track maintainers per component. | | ||
| | `attributes` and `metrics` | V1 has no telemetry schema. V2 is the only source for this data. | | ||
| | `config` schema | V1 has no configuration schema. | | ||
| | `unsupported_platforms` | V1 does not record platform restrictions. | | ||
|
|
||
| --- | ||
|
|
||
| ## How V1 is Maintained | ||
|
|
||
| Based on the comment from @svrnm (an OpenTelemetry maintainer) in the issue: | ||
|
|
||
| - **Version updates are automated**: There are scripts that update the `package.version` field in V1 | ||
| entries periodically. | ||
| - **URL health checks run manually**: Every few months a script checks whether URLs in V1 entries | ||
| are still valid. | ||
| - **Everything else is manual**: New component entries are added by hand when someone notices they | ||
| are missing. There is no automated detection of new components being added to contrib. | ||
| - **No audit trail**: There is no formal process to detect when a component is removed from contrib | ||
| and its V1 entry becomes stale. | ||
|
|
||
| --- | ||
|
|
||
| ## How V1 is Used | ||
|
|
||
| 1. **opentelemetry.io registry page**: The `/ecosystem/registry/` section of the website is | ||
| generated from V1 registry YAML files. | ||
| 2. **Hugo shortcodes**: A macro allows documentation pages to embed the current version of a | ||
| component by referencing its package name. The version comes from `package.version` in V1. | ||
| 3. **Version update automation**: A nightly workflow (`.github/workflows/auto-update-registry.yml`) | ||
| runs as `otelbot[bot]` and calls `.github/scripts/update-registry-versions.sh`. That script runs | ||
| `go list -m --versions` against the Go module index for each component and updates the | ||
| `package.version` field in `data/registry/collector-*.yml`. It does not read from V2 at all. | ||
|
|
||
| --- | ||
|
|
||
| ## How collector-sync Works | ||
|
|
||
| The collector-sync script is a Python project at `scripts/collector-sync/` in the opentelemetry.io | ||
| repo. It writes to `data/collector-versions.yml` and | ||
| `data/collector/{receivers,exporters,processors,extensions,connectors}.yml`. Those files are used by | ||
| Hugo `{{< component-link >}}` shortcodes to inject component names and links into documentation | ||
| pages. They are not the per-component registry entries under `data/registry/` that issue 119 is | ||
| about. | ||
|
|
||
| **Important**: collector-sync and the V1 registry (`data/registry/`) are two separate systems. | ||
| collector-sync does not update `data/registry/` and does not read from V2 registry data. | ||
|
|
||
| --- | ||
|
|
||
| ## Summary of Key Findings | ||
|
|
||
| 1. V2 has richer metadata than V1 in almost every dimension that matters for developer tooling | ||
| (stability, codeowners, signal-level telemetry schema). | ||
| 2. V1 has fields that V2 does not track: license, author attribution, and human-assigned tags. These | ||
| fields are important for the opentelemetry.io registry page but are not needed for the Ecosystem | ||
| Explorer. | ||
| 3. V1 and V2 are currently fully independent pipelines. V1 version updates come from the Go module | ||
| index via the otelbot nightly workflow. V2 metadata comes from upstream `metadata.yaml` files via | ||
| collector-watcher. There is no automation today that reads from V2 and writes into the V1 | ||
| per-component registry entries under `data/registry/`. | ||
| 4. V1 has some entries that V2 does not, particularly components from third-party distributions. | ||
| These would need to be handled separately if V2 were ever used to drive V1. | ||
| 5. V1 has no mechanism for detecting new components or removing stale ones automatically. V2 | ||
| automation does this nightly. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| --- | ||
| title: "Next Steps — Legacy Registry Research" | ||
| issue: 119 | ||
| type: roadmap | ||
| phase: meta | ||
| status: in-progress | ||
| last_updated: "2026-05-11" | ||
| --- | ||
|
|
||
| ## Current State | ||
|
|
||
| Research phase complete. Findings are in [`00-research.md`](./00-research.md). | ||
|
|
||
| **Correction from reviewer feedback**: An earlier version of the research claimed that V2 was | ||
| already feeding V1 via `collector-sync`. That was wrong. `collector-sync` writes to | ||
| `data/collector/` files that power Hugo shortcodes, not to the per-component `data/registry/` | ||
| entries. The V1 registry is updated by a separate `otelbot` nightly workflow that reads from the Go | ||
| module index. V1 and V2 are fully independent today. The question is whether to build a new | ||
| connection between them. | ||
|
|
||
| --- | ||
|
|
||
| ## Proposals | ||
|
|
||
| ### Proposal A: Build a new V2 to data/registry/ sync path | ||
|
|
||
| **What it is**: Write a new script or workflow that reads V2 registry data and updates the matching | ||
| V1 entries under `data/registry/` in the opentelemetry.io repo. Fields to sync would include | ||
| stability level, display name, description, and codeowners. This is a new pipeline, not an extension | ||
| of collector-sync (which does not touch `data/registry/`). | ||
|
|
||
| **Why it helps**: V1 entries would stay more accurate automatically. Users of the opentelemetry.io | ||
| registry would see up-to-date stability information instead of stale or missing data. | ||
|
|
||
| **Challenges**: | ||
|
|
||
| - V1 has fields (license, authors, tags) that V2 does not track. Those fields would still need to be | ||
| maintained manually in V1. | ||
| - The new sync path would need to coexist with the existing `otelbot` nightly job, which is today | ||
| the source of truth for the `package.version` field. | ||
| - The matching logic (V2 component name to V1 entry) would need to handle renames and deprecations | ||
| gracefully. | ||
| - Some V1 entries point to components outside core and contrib (third-party distributions). V2 does | ||
| not have data for these so they would be skipped by automation. | ||
|
|
||
| **Effort**: Medium. Requires writing a new sync script and integrating it with the opentelemetry.io | ||
| automation. | ||
|
|
||
| --- | ||
|
|
||
| ### Proposal B: Use V2 to auto-generate new V1 entries for components that exist in V2 but not V1 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think moving in this direction makes sense as a starting point. One challenge is that the V1 registry is currently maintained entirely by hand, so introducing a mix of manually maintained and automatically generated data could become confusing. If we go that route, we’d probably need a clear way to indicate which fields are generated so contributors know not to edit them manually. I think the next step is to explore how far we can realistically push the automation and get to the point where we can run some dry-run experiments. To get there, we should start by identifying the current gaps and breaking them into smaller steps. For example, an initial step could be investigating whether we can extend the existing watcher to derive license information automatically, or determine whether the tags are actually necessary and, if so, whether they could also be generated automatically.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback! That is a really valid concern. Since V1 is fully I also added a suggested path for breaking this into smaller steps before |
||
|
|
||
| **What it is**: Run a comparison between V2 (core + contrib) and V1 on a schedule. For any component | ||
| in V2 that has no matching V1 entry, generate a draft V1 entry with the fields that V2 can supply | ||
| and open a PR for human review before merging. | ||
|
|
||
| **Why it helps**: New components added to contrib would automatically get a V1 registry entry | ||
| without waiting for someone to notice they are missing. This addresses the biggest gap in V1 | ||
| maintenance today. | ||
|
|
||
| **Challenges**: | ||
|
|
||
| - The generated entry would be missing V1-only fields (license, authors, tags). A review step is | ||
| essential. | ||
| - Need to establish a reliable matching key between the two registries (Go module path is the most | ||
| stable option). | ||
|
|
||
| **Effort**: Medium. Requires a new script or extending the collector-watcher. | ||
|
|
||
| --- | ||
|
|
||
| ### Proposal C: Add a V1 sync task to the existing collector-watcher | ||
|
|
||
| **What it is**: Add a new output step to the ecosystem-automation/collector-watcher that, in | ||
| addition to writing V2 registry YAML files, also writes or updates V1-format YAML files in a staging | ||
| folder. These staged files can then be submitted to opentelemetry.io via the collector-sync | ||
| workflow. | ||
|
|
||
| **Why it helps**: Keeps the two registries in sync as a single automated pipeline rather than two | ||
| separate tools. The collector-watcher already runs nightly so V1 would get the same freshness. | ||
|
|
||
| **Challenges**: | ||
|
|
||
| - Requires coordination with the opentelemetry.io maintainers to integrate the staging output. | ||
| - V1-only fields (license, authors) would need a side-channel data source. | ||
|
|
||
| **Effort**: Medium-high. Requires changes to collector-watcher and a new integration with | ||
| opentelemetry.io automation. | ||
|
|
||
| --- | ||
|
|
||
| ### Proposal E: Invert the relationship — make V2 the source of truth and V1 a generated artifact | ||
|
|
||
| **What it is**: Instead of syncing V2 data into V1, flip the ownership model entirely. A new nightly | ||
| emitter in `ecosystem-automation/` reads V2 registry data and regenerates | ||
| `opentelemetry.io/data/registry/collector-*.yml` directly. For the fields V2 does not yet carry | ||
| (license, authors, tags), a thin per-entry sidecar file is maintained alongside the emitter. | ||
| `createdAt` is derivable from release history and does not need to be stored manually. The `otelbot` | ||
| nightly job is retired because the emitter sets `package.version` directly. | ||
|
|
||
| **Why it helps**: The opentelemetry.io website does not change at all. Same URLs, same Hugo | ||
| templates, same `version-from-registry` shortcode, same MiniSearch index. Nothing user-facing moves. | ||
| Compared to Proposal A, this approach removes a moving part (otelbot) rather than adding a new sync | ||
| layer that has to coexist with otelbot writing the same fields. As other ecosystem watchers come | ||
| online, the sidecar shrinks and the emitter coverage grows naturally. | ||
|
|
||
| **Challenges**: | ||
|
|
||
| - Requires writing a new emitter in ecosystem-automation and coordinating its output format with the | ||
| opentelemetry.io maintainers. | ||
| - The sidecar file for V1-only fields (license, authors, tags) needs a clear ownership model and an | ||
| initial population pass. | ||
| - Retiring otelbot requires agreement from the opentelemetry.io team since it currently owns version | ||
| updates for all registry entries, not just collector ones. | ||
|
|
||
| **Effort**: Medium-high. New emitter plus coordination across two repositories. | ||
|
|
||
| --- | ||
|
|
||
| ### Proposal D: Document and stabilize the current state before changing anything | ||
|
|
||
| **What it is**: Before automating further, write a clear description of the current V1 schema, | ||
| document which fields are auto-managed vs manually maintained, and get agreement from | ||
| opentelemetry.io maintainers on which proposals they would accept. | ||
|
|
||
| **Why it helps**: Changes to V1 affect the opentelemetry.io website directly. Doing this without | ||
| maintainer alignment risks breaking things or creating PRs that will be closed. | ||
|
|
||
| **Effort**: Low. Primarily communication and documentation work. | ||
|
|
||
| --- | ||
|
|
||
| ## Recommended Order | ||
|
|
||
| 1. **Start with Proposal D**: Share this research document with the opentelemetry.io maintainers and | ||
| the ecosystem-explorer team to get alignment on direction before writing code. | ||
| 2. **Then Proposal A**: Extend collector-sync to sync stability and description from V2 to V1. This | ||
| is the lowest-risk and highest-value first step. | ||
| 3. **Then Proposal B**: Automate detection of new components missing from V1. | ||
| 4. **Proposal C** is the long-term goal but requires the most coordination. | ||
|
|
||
| --- | ||
|
|
||
| ## Open Questions | ||
|
|
||
| 1. Are there V1 entries for components that have been removed from contrib? If so, should they be | ||
| marked deprecated or removed from V1? | ||
| 2. Who owns the V1 registry day-to-day? Is there a team or just individual contributors? | ||
| 3. Does the Hugo shortcode version-injection rely on the exact `package.version` format in V1? If V2 | ||
| changes its versioning scheme, would that break the shortcode? | ||
| 4. Should third-party distribution components (not in core or contrib) ever be added to V2, or | ||
| should V1 remain the only place for those? | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great analysis. One more thing I'd love to understand though is how users use this information too. As in, what do we think they are looking for when they visit the existing registry, what value can they get from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good point, I had not thought about it from the user side. I added
a subsection covering what users are actually trying to do when they visit
the registry. The main things I found are discovering components by type
or name, looking up the current version to pin in a config, checking
whether something is stable or still alpha, finding the source repo to
file a bug, and sometimes looking for who maintains a component.
Version lookup and discovery seem to be the most common day-to-day needs.
The stability check is the biggest gap right now since V1 has no stability
field at all, which is exactly where V2 data would add the most value if
synced across. Let me know if you think I missed any important use case.