Skip to content

docs: add research findings for issue 119 V1 vs V2 registry comparison#446

Merged
jaydeluca merged 11 commits into
open-telemetry:mainfrom
Rama542:research/119-legacy-registry
May 12, 2026
Merged

docs: add research findings for issue 119 V1 vs V2 registry comparison#446
jaydeluca merged 11 commits into
open-telemetry:mainfrom
Rama542:research/119-legacy-registry

Conversation

@Rama542
Copy link
Copy Markdown
Contributor

@Rama542 Rama542 commented May 11, 2026

Closes #119

This PR adds the research workspace for issue 119 under projects/119-legacy-registry-research/ following the existing conventions used by the 84-ui-ux-design folder.

Here is the current state of both registries that this research covers:

Screenshot 1
Screenshot 2026-05-11 180024

Screenshot 2
Screenshot 2026-05-11 180036

Three documents are included:

_index.md is the stable folder landing page that describes what is in the folder and where to start.

00-research.md is the full research document. It covers:

  • Component counts in V1 (274 collector components across receivers, exporters, processors, extensions, connectors) and V2 (249 in contrib at v0.151.0 plus core)
  • A side by side comparison of all metadata fields in each registry, including what V1 has that V2 does not (license, authors, tags, createdAt) and what V2 has that V1 does not (stability levels, distributions, codeowners, attributes, metrics, config schema)
  • How V1 is currently maintained, which is mostly manual except for version updates
  • How the collector-sync script in opentelemetry.io already reads from V2 to update V1 version fields, meaning V2 is already the upstream for V1 version data

NEXT-STEPS.md proposes four concrete options for how V2 can drive V1 maintenance going forward, with a recommended order starting from the lowest-risk step of expanding collector-sync to sync more fields, then automating detection of new components missing from V1.

The key finding is that V2 is already feeding V1 through the collector-sync pipeline. The V1 registry is currently the more complete and stable surface for users. Expanding the V2 to V1 connection incrementally is the safest path forward.

Add projects/119-legacy-registry-research/ with three documents:
- _index.md: folder landing page following the existing conventions
- 00-research.md: full findings covering component coverage counts,
  metadata field comparison, V1 maintenance model, and how
  collector-sync currently uses V2 data to update V1
- NEXT-STEPS.md: four concrete proposals for how V2 can automate
  or drive V1 maintenance going forward, with recommended order
@Rama542 Rama542 requested review from a team as code owners May 11, 2026 12:33
@netlify
Copy link
Copy Markdown

netlify Bot commented May 11, 2026

Deploy Preview for otel-ecosystem-explorer ready!

Name Link
🔨 Latest commit c056680
🔍 Latest deploy log https://app.netlify.com/projects/otel-ecosystem-explorer/deploys/6a033cd972c67f0008755db7
😎 Deploy Preview https://deploy-preview-446--otel-ecosystem-explorer.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Rama542 and others added 2 commits May 11, 2026 18:18
…h docs

- Run Prettier (proseWrap, 100-char printWidth) on all 3 research files
- Fix MD036: change **Contrib distribution** bold to #### heading
- Markdownlint and format-check now pass with 0 errors

Co-Authored-By: Rama542 <Rama542@users.noreply.github.com>
@lucacavenaghi97
Copy link
Copy Markdown
Member

lucacavenaghi97 commented May 11, 2026

Hi @Rama542 ! The research says collector-sync updates V1 registry entries with versions from V2. Reading the script source, collector-sync only writes to data/collector-versions.yml and data/collector/{receivers,exporters,processors,extensions,connectors}.yml. The string data/registry does not appear anywhere in scripts/collector-sync/. Those files feed the Hugo {{< component-link >}} shortcodes, not the per-component registry that the issue is about.

The actual mechanism that bumps package.version inside data/registry/collector-*.yml is a separate nightly workflow: .github/workflows/auto-update-registry.yml plus .github/scripts/update-registry-versions.sh. It runs go list -m --versions against the Go module index for each file and opens a PR as otelbot[bot]. It does not read from V2 at all.

So V1 and V2 are currently fully independent pipelines:

  • V1 versions come from the Go module index, nightly, via otelbot.
  • V2 metadata comes from upstream metadata.yaml files via collector-watcher.

Two implications:

  1. The "V2 is already feeding V1" key finding needs rewording. The V2 to data/registry/ pipeline does not exist today.
  2. Proposal A becomes "build a new V2 to data/registry/ sync path" rather than "extend collector-sync." Worth thinking about how it would interact with the existing nightly otelbot job (which today is the source of truth for the package.version field).

Rama542 and others added 2 commits May 12, 2026 00:23
The research incorrectly stated that collector-sync reads from V2 and
updates V1 registry entries. On review, collector-sync only writes to
data/collector/ files which power Hugo shortcodes, not to the
per-component data/registry/ entries.

The actual V1 registry version updates come from a separate otelbot
nightly workflow that reads from the Go module index. V1 and V2 are
fully independent pipelines today.

Updated 00-research.md to fix the How V1 is Used section, the
How collector-sync Works section, and Key Finding 3. Updated
NEXT-STEPS.md to correct the Current State summary and reframe
Proposal A as building a new sync path rather than extending
collector-sync.
@Rama542
Copy link
Copy Markdown
Contributor Author

Rama542 commented May 11, 2026

Thank you for the detailed correction, that was a significant research error on my part.

I verified the collector sync source and you are right. It only writes to data/collector/
files for the Hugo shortcodes and never touches data/registry/. I also looked at
auto update registry.yml and update registry versions.sh and confirmed that is the actual
mechanism keeping package.version current in data/registry/, using the Go module index via
go list, with no connection to V2 at all.

I have pushed a correction commit that fixes:

  • The "How V1 is Used" section now describes the otelbot nightly workflow correctly
  • The "How collector sync Works" section now accurately says it writes to data/collector/
    for Hugo shortcodes, not to data/registry/
  • Key Finding 3 now says V1 and V2 are fully independent pipelines today
  • Proposal A is reframed as building a new sync path rather than extending collector sync
  • The NEXT STEPS Current State summary is corrected as well

Let me know if anything else looks off.

@lucacavenaghi97
Copy link
Copy Markdown
Member

Following up on the next-steps proposals. The four options as listed all assume V1 stays the source of truth and V2 feeds it. Worth considering inverting that: V2 becomes the source, V1 becomes a generated artifact.

Concretely:

  • opentelemetry.io/data/registry/collector-*.yml is regenerated nightly by a new emitter in ecosystem-automation/, reading V2 plus a thin per-entry sidecar for fields V2 does not carry (license, authors, tags). The createdAt field is derivable from release history.
  • The otelbot nightly job is retired since the emitter sets package.version directly.
  • opentelemetry.io itself does not change. Same URL, same Hugo template, same version-from-registry shortcode, same MiniSearch index. Nothing user-facing moves.

Looking at the Hugo template that renders the registry page, most of the fields it reads are already covered by V2 or derivable. The only ones requiring schema growth in V2 or a sidecar file are license, authors, and tags. The work to extend V2 is therefore contained.

Compared to Proposal A, it removes a moving part (otelbot) instead of adding one (a sync from V2 into V1 that has to coexist with otelbot writing the same fields). The path also extends naturally to other ecosystems as their watchers come online: the sidecar shrinks, the emitter coverage grows.

Not saying this is necessarily the right direction, but it seems like a viable option worth including as a fifth proposal in NEXT-STEPS.md, so the team can weigh it alongside the others when picking the next step.

Copy link
Copy Markdown
Member

@lucacavenaghi97 lucacavenaghi97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Rama542
Copy link
Copy Markdown
Contributor Author

Rama542 commented May 11, 2026

That is a really good point. I had framed all the proposals around feeding V2 data
into V1, but inverting the ownership so V1 becomes a generated output is a cleaner
architecture. Removing otelbot rather than working around it makes more sense, and
keeping the sidecar small means the scope stays contained.

I have added this as Proposal E in NEXT-STEPS.md so the team can compare it
alongside the others. Let me know if the framing captures what you had in mind or
if anything needs adjusting.

@Rama542
Copy link
Copy Markdown
Contributor Author

Rama542 commented May 12, 2026

Hey @lucacavenaghi97, thanks for the approval and for the feedback that led to Proposal E!

I can see there are 2 pending reviews still open. Could you let me know if there's anything else needed from my side before this gets merged, or an approximate timeline for when it might land?

Happy to make any adjustments if needed. Thanks!


---

### Proposal B: Use V2 to auto-generate new V1 entries for components that exist in V2 but not V1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving in this direction makes sense as a starting point. One challenge is that the V1 registry is currently maintained entirely by hand, so introducing a mix of manually maintained and automatically generated data could become confusing. If we go that route, we’d probably need a clear way to indicate which fields are generated so contributors know not to edit them manually.

I think the next step is to explore how far we can realistically push the automation and get to the point where we can run some dry-run experiments.

To get there, we should start by identifying the current gaps and breaking them into smaller steps. For example, an initial step could be investigating whether we can extend the existing watcher to derive license information automatically, or determine whether the tags are actually necessary and, if so, whether they could also be generated automatically.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! That is a really valid concern. Since V1 is fully
hand-maintained right now, dropping auto-generated entries into the same
files without any clear indication could easily lead to contributors
accidentally overwriting generated data. I added a note about needing a
convention to mark which fields are auto generated so people know not to
touch them manually.

I also added a suggested path for breaking this into smaller steps before
jumping straight to full automation. The two initial investigation points
are whether license can just be inferred (since all contrib components are
Apache 2.0 anyway) and whether tags are predictable enough from component
type to be generated. If those two work out, the only fields still needing
human input would be authors and createdAt, which makes the review process
much easier. From there a dry run experiment would be a good way to
validate before committing to anything.


---

## How V1 is Used
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great analysis. One more thing I'd love to understand though is how users use this information too. As in, what do we think they are looking for when they visit the existing registry, what value can they get from it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good point, I had not thought about it from the user side. I added
a subsection covering what users are actually trying to do when they visit
the registry. The main things I found are discovering components by type
or name, looking up the current version to pin in a config, checking
whether something is stable or still alpha, finding the source repo to
file a bug, and sometimes looking for who maintains a component.

Version lookup and discovery seem to be the most common day-to-day needs.
The stability check is the biggest gap right now since V1 has no stability
field at all, which is exactly where V2 data would add the most value if
synced across. Let me know if you think I missed any important use case.

@jaydeluca jaydeluca merged commit 6a4f67e into open-telemetry:main May 12, 2026
15 checks passed
@otelbot
Copy link
Copy Markdown
Contributor

otelbot Bot commented May 12, 2026

Thank you for your contribution @Rama542! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Research integration with legacy registry

3 participants