docs: add research findings for issue 119 V1 vs V2 registry comparison#446
Conversation
Add projects/119-legacy-registry-research/ with three documents: - _index.md: folder landing page following the existing conventions - 00-research.md: full findings covering component coverage counts, metadata field comparison, V1 maintenance model, and how collector-sync currently uses V2 data to update V1 - NEXT-STEPS.md: four concrete proposals for how V2 can automate or drive V1 maintenance going forward, with recommended order
✅ Deploy Preview for otel-ecosystem-explorer ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
…h docs - Run Prettier (proseWrap, 100-char printWidth) on all 3 research files - Fix MD036: change **Contrib distribution** bold to #### heading - Markdownlint and format-check now pass with 0 errors Co-Authored-By: Rama542 <Rama542@users.noreply.github.com>
|
Hi @Rama542 ! The research says The actual mechanism that bumps So V1 and V2 are currently fully independent pipelines:
Two implications:
|
The research incorrectly stated that collector-sync reads from V2 and updates V1 registry entries. On review, collector-sync only writes to data/collector/ files which power Hugo shortcodes, not to the per-component data/registry/ entries. The actual V1 registry version updates come from a separate otelbot nightly workflow that reads from the Go module index. V1 and V2 are fully independent pipelines today. Updated 00-research.md to fix the How V1 is Used section, the How collector-sync Works section, and Key Finding 3. Updated NEXT-STEPS.md to correct the Current State summary and reframe Proposal A as building a new sync path rather than extending collector-sync.
|
Thank you for the detailed correction, that was a significant research error on my part. I verified the collector sync source and you are right. It only writes to data/collector/ I have pushed a correction commit that fixes:
Let me know if anything else looks off. |
|
Following up on the next-steps proposals. The four options as listed all assume V1 stays the source of truth and V2 feeds it. Worth considering inverting that: V2 becomes the source, V1 becomes a generated artifact. Concretely:
Looking at the Hugo template that renders the registry page, most of the fields it reads are already covered by V2 or derivable. The only ones requiring schema growth in V2 or a sidecar file are Compared to Proposal A, it removes a moving part (otelbot) instead of adding one (a sync from V2 into V1 that has to coexist with otelbot writing the same fields). The path also extends naturally to other ecosystems as their watchers come online: the sidecar shrinks, the emitter coverage grows. Not saying this is necessarily the right direction, but it seems like a viable option worth including as a fifth proposal in |
|
That is a really good point. I had framed all the proposals around feeding V2 data I have added this as Proposal E in NEXT-STEPS.md so the team can compare it |
|
Hey @lucacavenaghi97, thanks for the approval and for the feedback that led to Proposal E! I can see there are 2 pending reviews still open. Could you let me know if there's anything else needed from my side before this gets merged, or an approximate timeline for when it might land? Happy to make any adjustments if needed. Thanks! |
|
|
||
| --- | ||
|
|
||
| ### Proposal B: Use V2 to auto-generate new V1 entries for components that exist in V2 but not V1 |
There was a problem hiding this comment.
I think moving in this direction makes sense as a starting point. One challenge is that the V1 registry is currently maintained entirely by hand, so introducing a mix of manually maintained and automatically generated data could become confusing. If we go that route, we’d probably need a clear way to indicate which fields are generated so contributors know not to edit them manually.
I think the next step is to explore how far we can realistically push the automation and get to the point where we can run some dry-run experiments.
To get there, we should start by identifying the current gaps and breaking them into smaller steps. For example, an initial step could be investigating whether we can extend the existing watcher to derive license information automatically, or determine whether the tags are actually necessary and, if so, whether they could also be generated automatically.
There was a problem hiding this comment.
Thanks for the feedback! That is a really valid concern. Since V1 is fully
hand-maintained right now, dropping auto-generated entries into the same
files without any clear indication could easily lead to contributors
accidentally overwriting generated data. I added a note about needing a
convention to mark which fields are auto generated so people know not to
touch them manually.
I also added a suggested path for breaking this into smaller steps before
jumping straight to full automation. The two initial investigation points
are whether license can just be inferred (since all contrib components are
Apache 2.0 anyway) and whether tags are predictable enough from component
type to be generated. If those two work out, the only fields still needing
human input would be authors and createdAt, which makes the review process
much easier. From there a dry run experiment would be a good way to
validate before committing to anything.
|
|
||
| --- | ||
|
|
||
| ## How V1 is Used |
There was a problem hiding this comment.
great analysis. One more thing I'd love to understand though is how users use this information too. As in, what do we think they are looking for when they visit the existing registry, what value can they get from it?
There was a problem hiding this comment.
Really good point, I had not thought about it from the user side. I added
a subsection covering what users are actually trying to do when they visit
the registry. The main things I found are discovering components by type
or name, looking up the current version to pin in a config, checking
whether something is stable or still alpha, finding the source repo to
file a bug, and sometimes looking for who maintains a component.
Version lookup and discovery seem to be the most common day-to-day needs.
The stability check is the biggest gap right now since V1 has no stability
field at all, which is exactly where V2 data would add the most value if
synced across. Let me know if you think I missed any important use case.
Closes #119
This PR adds the research workspace for issue 119 under projects/119-legacy-registry-research/ following the existing conventions used by the 84-ui-ux-design folder.
Here is the current state of both registries that this research covers:
Screenshot 1

Screenshot 2

Three documents are included:
_index.md is the stable folder landing page that describes what is in the folder and where to start.
00-research.md is the full research document. It covers:
NEXT-STEPS.md proposes four concrete options for how V2 can drive V1 maintenance going forward, with a recommended order starting from the lowest-risk step of expanding collector-sync to sync more fields, then automating detection of new components missing from V1.
The key finding is that V2 is already feeding V1 through the collector-sync pipeline. The V1 registry is currently the more complete and stable surface for users. Expanding the V2 to V1 connection incrementally is the safest path forward.